Re: [R] Sampling the Distance Matrix
On Sep 25, 2015, at 12:54 PM, Lorenzo Isella wrote: > Apologies for not letting this thread rest in peace. > The small script > > # > set.seed(1234) > > x <- rnorm(20) > y <- rnorm(20) > > > goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], > y[idx]) ) > 0.9)) > > mycomb <- mtxcomb [ , goodcls] > # > > > is perfect to detects groups of 5 points whose distances to each other > are always above 0.9. > However, in my practical case I have about 500 points and I am looking > for subset of several tens of points whose distance is above a given > threshold. > Unfortunately, the approach above does not scale, so I wonder if > anybody is aware of an alternative approach. Find the center of the distribution, eliminate all the points within some reasonable radius perhaps sqrt( sd(x)^2 +sd(y)^2 ) and then work on the reduced set. If you needed to reduce it even further I could imagine sampling in sectors defined by tan(x/y). -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
Absolutely right! Thanks to both David for their help. Cheers Lorenzo On Fri, Sep 25, 2015 at 01:54:54PM +, David L Carlson wrote: You defined x and y in your original email as: x<-rnorm(20) y<-rnorm(20) mm<-as.matrix(cbind(x,y)) dst<-(dist(mm)) - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, September 24, 2015 6:30 PM To: Lorenzo Isella Cc: David L Carlson; r-help@r-project.org Subject: Re: [R] Sampling the Distance Matrix On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote: On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: Hi, And thanks for your reply. Essentially, your script gets the job done. For instance, if I run mm <- cbind(5/(1:5), -2*sqrt(1:5)) dst <- dist(mm) dst2 <- as.matrix(dst) diag(dst2) <- NA idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) then it correctly detects the first two rows, where all the values are larger than 0.9. In other words, it detects the points that are at least 0.9 units away from *all* the other points. My other question (I did not realize this until I got your answer) is the following: I have the distance matrix of a set of N points. You gave me an algorithm two find all the points that are at least 0.9 units away from any other points. However, in some cases, for me it is OK even a weaker condition: find a subset of k points (with k tunable) whose distance *from each other* is greater than 0.9 units (even if their distance from some other points may be smaller than 0.9). If I understand . Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: mtxcomb <- combn(1:20, 5) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mtxcomb [ , goodcls] In my sample it was around 9% of the total 5 item combinations. snipped a lot of output: . [,1440] [,1441] [1,] 12 13 [2,] 13 16 [3,] 16 17 [4,] 19 19 [5,] 20 20 dim( mtxcomb) [1] 5 15504 Hi, Thanks for your reply. I think I am getting there, but when I run your commands, I get this error message Error in cbind(x[idx], y[idx]) : object 'x' not found Any idea why? Should I combine those 3 lines with something else? No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain. Cheers Lorenzo David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
Apologies for not letting this thread rest in peace. The small script # set.seed(1234) x <- rnorm(20) y <- rnorm(20) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mycomb <- mtxcomb [ , goodcls] # is perfect to detects groups of 5 points whose distances to each other are always above 0.9. However, in my practical case I have about 500 points and I am looking for subset of several tens of points whose distance is above a given threshold. Unfortunately, the approach above does not scale, so I wonder if anybody is aware of an alternative approach. Many thanks Lorenzo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
You defined x and y in your original email as: > x<-rnorm(20) > y<-rnorm(20) > > mm<-as.matrix(cbind(x,y)) > > dst<-(dist(mm)) - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, September 24, 2015 6:30 PM To: Lorenzo Isella Cc: David L Carlson; r-help@r-project.org Subject: Re: [R] Sampling the Distance Matrix On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote: > On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: >> >> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: >> >>> Hi, >>> And thanks for your reply. >>> Essentially, your script gets the job done. >>> For instance, if I run >>> >>> mm <- cbind(5/(1:5), -2*sqrt(1:5)) >>> dst <- dist(mm) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> >>> then it correctly detects the first two rows, where all the values are >>> larger than 0.9. >>> In other words, it detects the points that are at least 0.9 units away >>> from *all* the other points. >>> My other question (I did not realize this until I got your answer) is >>> the following: I have the distance matrix of a set of N points. >>> You gave me an algorithm two find all the points that are at least 0.9 >>> units away from any other points. >>> However, in some cases, for me it is OK even a weaker condition: find >>> a subset of k points (with k tunable) whose distance *from each other* >>> is greater than 0.9 units (even if their distance from some other >>> points may be smaller than 0.9). >> >> If I understand . Make a matrix of unique combinations, then apply by >> rows to get the qualifying columns that satisfy the distance criterion: >> >> mtxcomb <- combn(1:20, 5) >> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], >> y[idx]) ) > 0.9)) >> mtxcomb [ , goodcls] >> >> In my sample it was around 9% of the total 5 item combinations. >> >> snipped a lot of output: >> . >> [,1440] [,1441] >> [1,] 12 13 >> [2,] 13 16 >> [3,] 16 17 >> [4,] 19 19 >> [5,] 20 20 >>> dim( mtxcomb) >> [1] 5 15504 >> > > Hi, > Thanks for your reply. > I think I am getting there, but when I run your commands, I get this > error message > > Error in cbind(x[idx], y[idx]) : object 'x' not found > > Any idea why? Should I combine those 3 lines with something else? No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain. > Cheers > > Lorenzo David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
Hi, And thanks for your reply. Essentially, your script gets the job done. For instance, if I run mm <- cbind(5/(1:5), -2*sqrt(1:5)) dst <- dist(mm) dst2 <- as.matrix(dst) diag(dst2) <- NA idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) then it correctly detects the first two rows, where all the values are larger than 0.9. In other words, it detects the points that are at least 0.9 units away from *all* the other points. My other question (I did not realize this until I got your answer) is the following: I have the distance matrix of a set of N points. You gave me an algorithm two find all the points that are at least 0.9 units away from any other points. However, in some cases, for me it is OK even a weaker condition: find a subset of k points (with k tunable) whose distance *from each other* is greater than 0.9 units (even if their distance from some other points may be smaller than 0.9). Any idea about how to tackle that? Is it simply a matter of detecting the row and column numbers of all the entries of the distance matrix larger than 0.9? Many thanks Lorenzo On Wed, Sep 23, 2015 at 09:23:04PM +, David L Carlson wrote: I think the OP wanted rows where all values were greater than .9. If so, this works: set.seed(42) dst <- dist(cbind(rnorm(20), rnorm(20))) dst2 <- as.matrix(dst) diag(dst2) <- NA idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) idx 13 18 19 13 18 19 dst2[idx, idx] 13 18 19 13 NA 2.272407 3.606054 18 2.272407 NA 1.578150 19 3.606054 1.578150 NA - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Wednesday, September 23, 2015 3:23 PM To: Lorenzo Isella Cc: r-help@r-project.org Subject: Re: [R] Sampling the Distance Matrix mm <- cbind(1/(1:5), sqrt(1:5)) d <- dist(mm) d 1 2 3 4 2 0.6492864 3 0.9901226 0.3588848 4 1.250 0.6369033 0.2806086 5 1.4723668 0.8748970 0.5213550 0.2413050 which(as.matrix(d)>0.9, arr.ind=TRUE) row col 3 3 1 4 4 1 5 5 1 1 1 3 1 1 4 1 1 5 I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 The as.matrix(d) is needed because dist returns the lower triangle of the distance matrix and an object of class "dist" and as.matrix.dist converts that into a matrix. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella <lorenzo.ise...@gmail.com> wrote: Dear All, Suppose you have a distance matrix stored like a dist object, for instance x<-rnorm(20) y<-rnorm(20) mm<-as.matrix(cbind(x,y)) dst<-(dist(mm)) Now, my problem is the following: I would like to get the rows of mm corresponding to points whose distance is always larger of, let's say, 0.9. In other words, if I were to compute the distance matrix on those selected rows of mm, apart from the diagonal, I would get all entries larger than 0.9. Any idea about how I can efficiently code that? Regards Lorenzo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
On Sep 24, 2015, at 1:54 PM, Lorenzo Isella wrote: > On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: >> >> On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: >> >>> Hi, >>> And thanks for your reply. >>> Essentially, your script gets the job done. >>> For instance, if I run >>> >>> mm <- cbind(5/(1:5), -2*sqrt(1:5)) >>> dst <- dist(mm) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> >>> then it correctly detects the first two rows, where all the values are >>> larger than 0.9. >>> In other words, it detects the points that are at least 0.9 units away >>> from *all* the other points. >>> My other question (I did not realize this until I got your answer) is >>> the following: I have the distance matrix of a set of N points. >>> You gave me an algorithm two find all the points that are at least 0.9 >>> units away from any other points. >>> However, in some cases, for me it is OK even a weaker condition: find >>> a subset of k points (with k tunable) whose distance *from each other* >>> is greater than 0.9 units (even if their distance from some other >>> points may be smaller than 0.9). >> >> If I understand . Make a matrix of unique combinations, then apply by >> rows to get the qualifying columns that satisfy the distance criterion: >> >> mtxcomb <- combn(1:20, 5) >> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], >> y[idx]) ) > 0.9)) >> mtxcomb [ , goodcls] >> >> In my sample it was around 9% of the total 5 item combinations. >> >> snipped a lot of output: >> . >> [,1440] [,1441] >> [1,] 12 13 >> [2,] 13 16 >> [3,] 16 17 >> [4,] 19 19 >> [5,] 20 20 >>> dim( mtxcomb) >> [1] 5 15504 >> > > Hi, > Thanks for your reply. > I think I am getting there, but when I run your commands, I get this > error message > > Error in cbind(x[idx], y[idx]) : object 'x' not found > > Any idea why? Should I combine those 3 lines with something else? No idea. I was running the setup that you asked for in your original message which you have now omitted from the mail chain. > Cheers > > Lorenzo David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: > Hi, > And thanks for your reply. > Essentially, your script gets the job done. > For instance, if I run > > mm <- cbind(5/(1:5), -2*sqrt(1:5)) > dst <- dist(mm) > dst2 <- as.matrix(dst) > diag(dst2) <- NA > idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) > > then it correctly detects the first two rows, where all the values are > larger than 0.9. > In other words, it detects the points that are at least 0.9 units away > from *all* the other points. > My other question (I did not realize this until I got your answer) is > the following: I have the distance matrix of a set of N points. > You gave me an algorithm two find all the points that are at least 0.9 > units away from any other points. > However, in some cases, for me it is OK even a weaker condition: find > a subset of k points (with k tunable) whose distance *from each other* > is greater than 0.9 units (even if their distance from some other > points may be smaller than 0.9). If I understand . Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: mtxcomb <- combn(1:20, 5) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mtxcomb [ , goodcls] In my sample it was around 9% of the total 5 item combinations. snipped a lot of output: . [,1440] [,1441] [1,] 12 13 [2,] 13 16 [3,] 16 17 [4,] 19 19 [5,] 20 20 > dim( mtxcomb) [1] 5 15504 -- David > Any idea about how to tackle that? Is it simply a matter of detecting > the row and column numbers of all the entries of the distance matrix > larger than 0.9? > Many thanks > > Lorenzo > > > > On Wed, Sep 23, 2015 at 09:23:04PM +, David L Carlson wrote: >> I think the OP wanted rows where all values were greater than .9. >> If so, this works: >> >>> set.seed(42) >>> dst <- dist(cbind(rnorm(20), rnorm(20))) >>> dst2 <- as.matrix(dst) >>> diag(dst2) <- NA >>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) >>> idx >> 13 18 19 >> 13 18 19 >>> dst2[idx, idx] >>13 18 19 >> 13 NA 2.272407 3.606054 >> 18 2.272407 NA 1.578150 >> 19 3.606054 1.578150 NA >> >> - >> David L Carlson >> Department of Anthropology >> Texas A University >> College Station, TX 77840-4352 >> >> >> >> -Original Message- >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William >> Dunlap >> Sent: Wednesday, September 23, 2015 3:23 PM >> To: Lorenzo Isella >> Cc: r-help@r-project.org >> Subject: Re: [R] Sampling the Distance Matrix >> >>> mm <- cbind(1/(1:5), sqrt(1:5)) >>> d <- dist(mm) >>> d >> 1 2 3 4 >> 2 0.6492864 >> 3 0.9901226 0.3588848 >> 4 1.250 0.6369033 0.2806086 >> 5 1.4723668 0.8748970 0.5213550 0.2413050 >>> which(as.matrix(d)>0.9, arr.ind=TRUE) >> row col >> 3 3 1 >> 4 4 1 >> 5 5 1 >> 1 1 3 >> 1 1 4 >> 1 1 5 >> I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 >> >> The as.matrix(d) is needed because dist returns the lower triangle of >> the distance >> matrix and an object of class "dist" and as.matrix.dist converts that >> into a matrix. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> >> On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella >> <lorenzo.ise...@gmail.com> wrote: >>> Dear All, >>> Suppose you have a distance matrix stored like a dist object, for >>> instance >>> >>> x<-rnorm(20) >>> y<-rnorm(20) >>> >>> mm<-as.matrix(cbind(x,y)) >>> >>> dst<-(dist(mm)) >>> >>> Now, my problem is the following: I would like to get the rows of mm >>> corresponding to points whose distance is always larger of, let's say, >>> 0.9. >>> In other words, if I were to compute the distance matrix on those >>> selected rows of mm, apart from the diagonal, I would get all entries >>> larger than 0.9. >>> Any idea about how I can efficiently code that? >>> Regards >>> >>> Lorenzo >>> >>> __ >>> R-help@r-project.
Re: [R] Sampling the Distance Matrix
On Thu, Sep 24, 2015 at 01:30:02PM -0700, David Winsemius wrote: On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote: Hi, And thanks for your reply. Essentially, your script gets the job done. For instance, if I run mm <- cbind(5/(1:5), -2*sqrt(1:5)) dst <- dist(mm) dst2 <- as.matrix(dst) diag(dst2) <- NA idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) then it correctly detects the first two rows, where all the values are larger than 0.9. In other words, it detects the points that are at least 0.9 units away from *all* the other points. My other question (I did not realize this until I got your answer) is the following: I have the distance matrix of a set of N points. You gave me an algorithm two find all the points that are at least 0.9 units away from any other points. However, in some cases, for me it is OK even a weaker condition: find a subset of k points (with k tunable) whose distance *from each other* is greater than 0.9 units (even if their distance from some other points may be smaller than 0.9). If I understand . Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion: mtxcomb <- combn(1:20, 5) goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9)) mtxcomb [ , goodcls] In my sample it was around 9% of the total 5 item combinations. snipped a lot of output: . [,1440] [,1441] [1,] 12 13 [2,] 13 16 [3,] 16 17 [4,] 19 19 [5,] 20 20 dim( mtxcomb) [1] 5 15504 Hi, Thanks for your reply. I think I am getting there, but when I run your commands, I get this error message Error in cbind(x[idx], y[idx]) : object 'x' not found Any idea why? Should I combine those 3 lines with something else? Cheers Lorenzo __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
I think the OP wanted rows where all values were greater than .9. If so, this works: > set.seed(42) > dst <- dist(cbind(rnorm(20), rnorm(20))) > dst2 <- as.matrix(dst) > diag(dst2) <- NA > idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9))) > idx 13 18 19 13 18 19 > dst2[idx, idx] 13 18 19 13 NA 2.272407 3.606054 18 2.272407 NA 1.578150 19 3.606054 1.578150 NA - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of William Dunlap Sent: Wednesday, September 23, 2015 3:23 PM To: Lorenzo Isella Cc: r-help@r-project.org Subject: Re: [R] Sampling the Distance Matrix > mm <- cbind(1/(1:5), sqrt(1:5)) > d <- dist(mm) > d 1 2 3 4 2 0.6492864 3 0.9901226 0.3588848 4 1.250 0.6369033 0.2806086 5 1.4723668 0.8748970 0.5213550 0.2413050 > which(as.matrix(d)>0.9, arr.ind=TRUE) row col 3 3 1 4 4 1 5 5 1 1 1 3 1 1 4 1 1 5 I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 The as.matrix(d) is needed because dist returns the lower triangle of the distance matrix and an object of class "dist" and as.matrix.dist converts that into a matrix. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella <lorenzo.ise...@gmail.com> wrote: > Dear All, > Suppose you have a distance matrix stored like a dist object, for > instance > > x<-rnorm(20) > y<-rnorm(20) > > mm<-as.matrix(cbind(x,y)) > > dst<-(dist(mm)) > > Now, my problem is the following: I would like to get the rows of mm > corresponding to points whose distance is always larger of, let's say, > 0.9. > In other words, if I were to compute the distance matrix on those > selected rows of mm, apart from the diagonal, I would get all entries > larger than 0.9. > Any idea about how I can efficiently code that? > Regards > > Lorenzo > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling the Distance Matrix
> mm <- cbind(1/(1:5), sqrt(1:5)) > d <- dist(mm) > d 1 2 3 4 2 0.6492864 3 0.9901226 0.3588848 4 1.250 0.6369033 0.2806086 5 1.4723668 0.8748970 0.5213550 0.2413050 > which(as.matrix(d)>0.9, arr.ind=TRUE) row col 3 3 1 4 4 1 5 5 1 1 1 3 1 1 4 1 1 5 I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9 The as.matrix(d) is needed because dist returns the lower triangle of the distance matrix and an object of class "dist" and as.matrix.dist converts that into a matrix. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isellawrote: > Dear All, > Suppose you have a distance matrix stored like a dist object, for > instance > > x<-rnorm(20) > y<-rnorm(20) > > mm<-as.matrix(cbind(x,y)) > > dst<-(dist(mm)) > > Now, my problem is the following: I would like to get the rows of mm > corresponding to points whose distance is always larger of, let's say, > 0.9. > In other words, if I were to compute the distance matrix on those > selected rows of mm, apart from the diagonal, I would get all entries > larger than 0.9. > Any idea about how I can efficiently code that? > Regards > > Lorenzo > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows with values never sampled before
If df is the data.frame with values and you want nn samples, then this is a slightly different approach: # example data.frame: df = data.frame(a1 = sample(1:20,50, replace = TRUE), a2 = sample(seq(0.1,10,length.out = 30),50, replace = TRUE), a3 = sample(seq(0.3, 20,length.out = 20),50,replace = TRUE)) nrow = dim(df)[1] # 50 ncol = dim(df)[2] # 3 # start by randomizing the order in your data.frame randomOrder = sample(1:nrow, nrow, replace = FALSE) dff = df[randomOrder,] # find and remove all duplicates from all columns. With this you will only keep the first instance of any unique value: rem = NULL for (ic in 1:ncol) rem = c(rem, which(duplicated(dff[, ic]))) if (length(rem) 0) dff = dff[-unique(rem),] # Reduce to the length you need if (dim(dff)[1] nn) res = dff[1:nn,] else res = dff I am not sure how this scales if you have a really big data, and whether you could get some FAQ 7.31 problems depending on how you fill your data.frame. Cheers, Jon On 6/23/2015 12:13 AM, C W wrote: Hi Jean, Thanks! Daniel, Yes, you are absolutely right. I want sampled vectors to be as different as possible. I added a little more to the earlier data set. x1 x2 x3 [1,] 1 3.7 2.1 [2,] 2 3.7 5.3 [3,] 3 3.7 6.2 [4,] 4 3.7 8.9 [5,] 5 3.7 4.1 [6,] 1 2.9 2.1 [7,] 2 2.9 5.3 [8,] 3 2.9 6.2 [9,] 4 2.9 8.9 [10,] 5 2.9 4.1 [11,] 1 5.2 2.1 [12,] 2 5.2 5.3 [13,] 3 5.2 6.2 [14,] 4 5.2 8.9 [15,] 5 5.2 4.1 If I sampled row, 1, 6, 11, solving the system of equations will not be possible. So, I am avoiding similar vectors. Thanks, Mike On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund djnordl...@frontier.com wrote: On 6/22/2015 9:42 AM, C W wrote: Hello R list, I am have question about sampling unique coordinate values. Here's how my data looks like dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5)) dat x1 x2 [1,] 1 3.7 [2,] 2 3.7 [3,] 3 3.7 [4,] 4 3.7 [5,] 5 3.7 [6,] 1 2.9 [7,] 2 2.9 [8,] 3 2.9 [9,] 4 2.9 [10,] 5 2.9 [11,] 1 5.2 [12,] 2 5.2 [13,] 3 5.2 [14,] 4 5.2 [15,] 5 5.2 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7). I want to avoid either the first or second coordinate repeated. It leads to undefined matrix inversion. I thought of using sampling(), but not sure about applying it to a data frame. Thanks in advance, Mike I am not sure you gave us enough information to solve your real world problem. But I have a few comments and a potential solution. 1. In your example the unique values in in x1 are completely crossed with the unique values in x2. 2. since you don't want duplicates of either number, then the maximum number of samples that you can take is the minimum number of unique values in either vector, x1 or x2 (in this case x2 with 3 unique values). 3. Sample without replace from the smallest set of unique values first. 4. Sample without replacement from the larger set second. x - 1:5 xx - c(3.7, 2.9, 5.2) s2 - sample(xx,2, replace=FALSE) s1 - sample(x,2, replace=FALSE) samp - cbind(s1,s2) samp s1 s2 [1,] 5 3.7 [2,] 1 5.2 Your actual data is probably larger, and the unique values in each vector may not be completely crossed, in which case the task is a little harder. In that case, you could remove values from your data as you sample. This may not be efficient, but it will work. smpl - function(dat, size){ mysamp - numeric(0) for(i in 1:size) { s - dat[sample(nrow(dat),1),] mysamp - rbind(mysamp,s, deparse.level=0) dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),] } mysamp } This is just an example of how you might approach your real world problem. There is no error checking, and for large samples it may not scale well. Hope this is helpful, Dan -- Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Climate Risk Management Unit Via Fermi 2749, TP 100-01, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Tel: +39 0332 789205 Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission.
Re: [R] sampling rows with values never sampled before
Mike, There may be a more efficient way to do this, but this works on your example. # mix up the order of the rows mix - dat[order(runif(dim(dat)[1])), ] # get rid of duplicate x1s and x2s sub - mix[!duplicated(mix[, x1]) !duplicated(mix[, x2]), ] sub Jean On Mon, Jun 22, 2015 at 11:42 AM, C W tmrs...@gmail.com wrote: Hello R list, I am have question about sampling unique coordinate values. Here's how my data looks like dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5)) dat x1 x2 [1,] 1 3.7 [2,] 2 3.7 [3,] 3 3.7 [4,] 4 3.7 [5,] 5 3.7 [6,] 1 2.9 [7,] 2 2.9 [8,] 3 2.9 [9,] 4 2.9 [10,] 5 2.9 [11,] 1 5.2 [12,] 2 5.2 [13,] 3 5.2 [14,] 4 5.2 [15,] 5 5.2 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7). I want to avoid either the first or second coordinate repeated. It leads to undefined matrix inversion. I thought of using sampling(), but not sure about applying it to a data frame. Thanks in advance, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows with values never sampled before
Hi Jean, Thanks! Daniel, Yes, you are absolutely right. I want sampled vectors to be as different as possible. I added a little more to the earlier data set. x1 x2 x3 [1,] 1 3.7 2.1 [2,] 2 3.7 5.3 [3,] 3 3.7 6.2 [4,] 4 3.7 8.9 [5,] 5 3.7 4.1 [6,] 1 2.9 2.1 [7,] 2 2.9 5.3 [8,] 3 2.9 6.2 [9,] 4 2.9 8.9 [10,] 5 2.9 4.1 [11,] 1 5.2 2.1 [12,] 2 5.2 5.3 [13,] 3 5.2 6.2 [14,] 4 5.2 8.9 [15,] 5 5.2 4.1 If I sampled row, 1, 6, 11, solving the system of equations will not be possible. So, I am avoiding similar vectors. Thanks, Mike On Mon, Jun 22, 2015 at 2:19 PM, Daniel Nordlund djnordl...@frontier.com wrote: On 6/22/2015 9:42 AM, C W wrote: Hello R list, I am have question about sampling unique coordinate values. Here's how my data looks like dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5)) dat x1 x2 [1,] 1 3.7 [2,] 2 3.7 [3,] 3 3.7 [4,] 4 3.7 [5,] 5 3.7 [6,] 1 2.9 [7,] 2 2.9 [8,] 3 2.9 [9,] 4 2.9 [10,] 5 2.9 [11,] 1 5.2 [12,] 2 5.2 [13,] 3 5.2 [14,] 4 5.2 [15,] 5 5.2 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7). I want to avoid either the first or second coordinate repeated. It leads to undefined matrix inversion. I thought of using sampling(), but not sure about applying it to a data frame. Thanks in advance, Mike I am not sure you gave us enough information to solve your real world problem. But I have a few comments and a potential solution. 1. In your example the unique values in in x1 are completely crossed with the unique values in x2. 2. since you don't want duplicates of either number, then the maximum number of samples that you can take is the minimum number of unique values in either vector, x1 or x2 (in this case x2 with 3 unique values). 3. Sample without replace from the smallest set of unique values first. 4. Sample without replacement from the larger set second. x - 1:5 xx - c(3.7, 2.9, 5.2) s2 - sample(xx,2, replace=FALSE) s1 - sample(x,2, replace=FALSE) samp - cbind(s1,s2) samp s1 s2 [1,] 5 3.7 [2,] 1 5.2 Your actual data is probably larger, and the unique values in each vector may not be completely crossed, in which case the task is a little harder. In that case, you could remove values from your data as you sample. This may not be efficient, but it will work. smpl - function(dat, size){ mysamp - numeric(0) for(i in 1:size) { s - dat[sample(nrow(dat),1),] mysamp - rbind(mysamp,s, deparse.level=0) dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),] } mysamp } This is just an example of how you might approach your real world problem. There is no error checking, and for large samples it may not scale well. Hope this is helpful, Dan -- Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows with values never sampled before
On 6/22/2015 9:42 AM, C W wrote: Hello R list, I am have question about sampling unique coordinate values. Here's how my data looks like dat - cbind(x1 = rep(1:5, 3), x2 = rep(c(3.7, 2.9, 5.2), each=5)) dat x1 x2 [1,] 1 3.7 [2,] 2 3.7 [3,] 3 3.7 [4,] 4 3.7 [5,] 5 3.7 [6,] 1 2.9 [7,] 2 2.9 [8,] 3 2.9 [9,] 4 2.9 [10,] 5 2.9 [11,] 1 5.2 [12,] 2 5.2 [13,] 3 5.2 [14,] 4 5.2 [15,] 5 5.2 If I sampled (1, 3.7), then, I don't want (1, 2.9) or (2, 3.7). I want to avoid either the first or second coordinate repeated. It leads to undefined matrix inversion. I thought of using sampling(), but not sure about applying it to a data frame. Thanks in advance, Mike I am not sure you gave us enough information to solve your real world problem. But I have a few comments and a potential solution. 1. In your example the unique values in in x1 are completely crossed with the unique values in x2. 2. since you don't want duplicates of either number, then the maximum number of samples that you can take is the minimum number of unique values in either vector, x1 or x2 (in this case x2 with 3 unique values). 3. Sample without replace from the smallest set of unique values first. 4. Sample without replacement from the larger set second. x - 1:5 xx - c(3.7, 2.9, 5.2) s2 - sample(xx,2, replace=FALSE) s1 - sample(x,2, replace=FALSE) samp - cbind(s1,s2) samp s1 s2 [1,] 5 3.7 [2,] 1 5.2 Your actual data is probably larger, and the unique values in each vector may not be completely crossed, in which case the task is a little harder. In that case, you could remove values from your data as you sample. This may not be efficient, but it will work. smpl - function(dat, size){ mysamp - numeric(0) for(i in 1:size) { s - dat[sample(nrow(dat),1),] mysamp - rbind(mysamp,s, deparse.level=0) dat - dat[!(dat[,1]==s[1] | dat[,2]==s[2]),] } mysamp } This is just an example of how you might approach your real world problem. There is no error checking, and for large samples it may not scale well. Hope this is helpful, Dan -- Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling
On 3/29/2015 11:10 PM, Partha Sinha wrote: I have 1000 data points. i want to take 30 samples and find mean. I also want to repeat this process 100 times. How to go about it? Regards Parth __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. see ?replicate and ?sample. Simple example where yourdata is a simple vector of values, and assuming you want to sample without replacement. Generalizing it to other data structures is left as an exercise for the reader. replicate(100,mean(sample(yourdata,30, replace=FALSE))) hope this is helpful, Dan -- Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling dataframe based upon number of record occurrences
I'm not sure I understand, but I think you have a large data frame with records and you want to construct a sample of that data frame that includes no more than 3 records for each IDbyYear combination? You say there are 5589 unique combinations and your code uses a data frame called fitting_set. Assuming this is the data frame you are describing, your code will select all of the lines since fitting_set$IDbyYear[i] is always a vector of length 1. We need a reproducible example. The best way for you to give us that would be to copy the result of dput(head(fitting_set, 10)). It would look something like this plus the 6 other columns you mention except that I've added dta - in front of structure() to create a data frame: dta - structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24, 42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c(A-Airport, A-Bark Corral East), class = factor), Year = c(2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L )), .Names = c(IDbyYear, SiteID, Year), class = data.frame, row.names = c(NA, -11L)) Now create a list of data frames, one for each IDbyYear: dta.list - split(dta, dta$IDbyYear) Now a function that will select 3 rows or all of them if there are fewer: smp - function(dframe) { ind - seq_len(nrow(dframe)) dframe[sample(ind, ifelse(length(ind)2, 3, length(ind))),] } Now take the samples and combine them into a single data frame: sample - do.call(rbind, lapply(dta.list, smp)) sample - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis Burkhalter Sent: Tuesday, March 3, 2015 3:23 PM To: r-help@r-project.org Subject: [R] sampling dataframe based upon number of record occurrences Hello everyone, I'm having trouble performing a task that is probably very simple, but can't seem to figure out how to get my code to work. What I want to do is use the sample function to pick records within in a dataframe, but only if a column attribute value is repeated more than 3 times. So if you look at the data below I have created a unique attribute value that corresponds to every site by year combination (i.e. IDxYear). So you can see that for the site called A-Airport it was sampled 6 times in 2006, A-Bank Corral East was sampled twice in 2008. So what I want to do is randomly select 3 records for A-Airport in 2006 for the existing 6 records, but for A-Bark Corral East in 2008 I just want to leave these records as they currently are. I've used the following code to try and accomplish this, but like I said I can't get it to work so I'm clearly doing something wrong. If you could check out the code and provide any suggestions that would be great. It should be noted that there are 5589 unique IDxYear combinations so that's why that number is in the code. If any further clarification is needed also let me know. boom=data.frame() for (i in 1:5589){ boom[i,]=ifelse(length(fitting_set$IDbyYear[i]3),fitting_set[sample(nrow(fitting_set),3),],fitting_set) } boom *IDbyYear* *SiteID * *Year* *6 other column attributes* 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 45.32 A-Bark Corral East2008 45.32 A-Bark Corral East2008 45.36 A-Bark Corral East2009 45.40 A-Bark Corral East2010 45.40 A-Bark Corral East 2010 Thanks -- Curtis Burkhalter https://sites.google.com/site/curtisburkhalter/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling dataframe based upon number of record occurrences
Here is an implementation with function named getSample. Some modification to the data was made so that it can be read as a table. fitting.set IDbyYear SiteID Year 1 42.24 A-Airport 2006 2 42.24 A-Airport 2006 3 42.24 A-Airport 2006 4 42.24 A-Airport 2006 5 42.24 A-Airport 2006 6 42.24 A-Airport 2006 7 45.32 A-Bark.Corral.East 2008 8 45.32 A-Bark.Corral.East 2008 9 45.36 A-Bark.Corral.East 2009 1045.40 A-Bark.Corral.East 2010 1145.40 A-Bark.Corral.East 2010 getSample function(x) { sites - unique(x$SiteID) years - unique(x$Year) result - data.frame() x$ID - seq(1,nrow(x)) for (i in 1:length(sites)) { for (j in 1:length(years)) { if (nrow(x[as.character(x$SiteID)==as.character(sites[i]) x$Year==years[j],]) 3) { sampledID - sample(x[as.character(x$SiteID)==as.character(sites[i]) x$Year==years[j],]$ID,3,replace=FALSE) for (k in 1:length(sampledID)) { result - rbind(result,x[x$ID==sampledID[k],-4]) } } } } names(result) - c(IDbyYear,SiteID,Year) rownames(result) - NULL return(result) } getSample(fitting.set) IDbyYearSiteID Year 142.24 A-Airport 2006 242.24 A-Airport 2006 342.24 A-Airport 2006 -- View this message in context: http://r.789695.n4.nabble.com/sampling-dataframe-based-upon-number-of-record-occurrences-tp4704144p4704154.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling dataframe based upon number of record occurrences
That worked great, thanks so much David! On Wed, Mar 4, 2015 at 8:23 AM, David L Carlson dcarl...@tamu.edu wrote: I'm not sure I understand, but I think you have a large data frame with records and you want to construct a sample of that data frame that includes no more than 3 records for each IDbyYear combination? You say there are 5589 unique combinations and your code uses a data frame called fitting_set. Assuming this is the data frame you are describing, your code will select all of the lines since fitting_set$IDbyYear[i] is always a vector of length 1. We need a reproducible example. The best way for you to give us that would be to copy the result of dput(head(fitting_set, 10)). It would look something like this plus the 6 other columns you mention except that I've added dta - in front of structure() to create a data frame: dta - structure(list(IDbyYear = c(42.24, 42.24, 42.24, 42.24, 42.24, 42.24, 45.32, 45.32, 45.36, 45.4, 45.4), SiteID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c(A-Airport, A-Bark Corral East), class = factor), Year = c(2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2008L, 2008L, 2009L, 2010L, 2010L )), .Names = c(IDbyYear, SiteID, Year), class = data.frame, row.names = c(NA, -11L)) Now create a list of data frames, one for each IDbyYear: dta.list - split(dta, dta$IDbyYear) Now a function that will select 3 rows or all of them if there are fewer: smp - function(dframe) { ind - seq_len(nrow(dframe)) dframe[sample(ind, ifelse(length(ind)2, 3, length(ind))),] } Now take the samples and combine them into a single data frame: sample - do.call(rbind, lapply(dta.list, smp)) sample - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Curtis Burkhalter Sent: Tuesday, March 3, 2015 3:23 PM To: r-help@r-project.org Subject: [R] sampling dataframe based upon number of record occurrences Hello everyone, I'm having trouble performing a task that is probably very simple, but can't seem to figure out how to get my code to work. What I want to do is use the sample function to pick records within in a dataframe, but only if a column attribute value is repeated more than 3 times. So if you look at the data below I have created a unique attribute value that corresponds to every site by year combination (i.e. IDxYear). So you can see that for the site called A-Airport it was sampled 6 times in 2006, A-Bank Corral East was sampled twice in 2008. So what I want to do is randomly select 3 records for A-Airport in 2006 for the existing 6 records, but for A-Bark Corral East in 2008 I just want to leave these records as they currently are. I've used the following code to try and accomplish this, but like I said I can't get it to work so I'm clearly doing something wrong. If you could check out the code and provide any suggestions that would be great. It should be noted that there are 5589 unique IDxYear combinations so that's why that number is in the code. If any further clarification is needed also let me know. boom=data.frame() for (i in 1:5589){ boom[i,]=ifelse(length(fitting_set$IDbyYear[i]3),fitting_set[sample(nrow(fitting_set),3),],fitting_set) } boom *IDbyYear* *SiteID * *Year* *6 other column attributes* 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 42.24 A-Airport 2006 45.32 A-Bark Corral East2008 45.32 A-Bark Corral East2008 45.36 A-Bark Corral East2009 45.40 A-Bark Corral East2010 45.40 A-Bark Corral East 2010 Thanks -- Curtis Burkhalter https://sites.google.com/site/curtisburkhalter/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Curtis Burkhalter https://sites.google.com/site/curtisburkhalter/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] sampling dataframe based upon number of record occurrences
Since you indicated there are six more columns in the data.frame, getSample modified below to take care of it. getSample function(x) { sites - unique(x$SiteID) years - unique(x$Year) result - data.frame() x$ID - seq(1,nrow(x)) for (i in 1:length(sites)) { for (j in 1:length(years)) { if (nrow(x[as.character(x$SiteID)==as.character(sites[i]) x$Year==years[j],]) 3) { sampledID - sample(x[as.character(x$SiteID)==as.character(sites[i]) x$Year==years[j],]$ID,3,replace=FALSE) for (k in 1:length(sampledID)) { result - rbind(result,x[x$ID==sampledID[k],-ncol(x)]) } } } } names(result) - names(x)[-ncol(x)] rownames(result) - NULL return(result) } getSample(fitting.set) IDbyYearSiteID Year 142.24 A-Airport 2006 242.24 A-Airport 2006 342.24 A-Airport 2006 -- View this message in context: http://r.789695.n4.nabble.com/sampling-dataframe-based-upon-number-of-record-occurrences-tp4704144p4704155.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling according to type
If I understood correctly, you need weighted sampling. Try 'prob' argument from 'sample'. For your example: n - 10 ntype - rbinom(n, 1, 0.5) myProbs - rep(1/10, 10) # equally likely myProbs[ which(ntype == 0)] - 0.75/7 # Divide so the sum will be 1.0 myProbs[ which(ntype == 1)] - 0.25/3 sample(ntype,3, prob=myProbs) On 5 March 2014 15:20, Thomas thomas.ches...@nottingham.ac.uk wrote: I have a matrix where each entry represents a data subject's type, 1 or 0: n - 10 ntype - rbinom(n, 1, 0.5) and I'd like to sample say 3 subjects from ntype where those subjects who are Type 1 are selected with probability say 0.75, and Type 0 with (1-0.75). (So the sample would produce a list with three indices each referring to a position within ntype.) Can anyone suggest a way to do this please? Thank you, Thomas Chesney This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling according to type
myProbs[ which(ntype == 0)] - 0.75/7 # Divide so the sum will be 1.0 myProbs[ which(ntype == 1)] - 0.25/3 Here of course you need to divide by number of 0s and 1s, 7 and 3 were was just an example. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling question
Hi, You may try: dat1 - structure(list(SubID = 1:8, CSE1 = c(6L, 6L, 5L, 5L, 5L, 5L, 3L, 3L), CSE2 = c(5L, 4L, 5L, 4L, 6L, 4L, 6L, 6L), CSE3 = c(6L, 7L, 5L, 3L, 7L, 3L, 6L, 6L), CSE4 = c(2L, 2L, 5L, 4L, 5L, 6L, 3L, 3L), WSE1 = c(6L, 6L, 5L, 4L, 6L, 4L, 6L, 6L), WSE2 = c(2L, 6L, 5L, 4L, 4L, 3L, 5L, 5L), WSE3 = c(2L, 2L, 4L, 5L, 4L, 7L, 2L, 4L), WSE4 = c(4L, 3L, 5L, 2L, 1L, 3L, 1L, 7L)), .Names = c(SubID, CSE1, CSE2, CSE3, CSE4, WSE1, WSE2, WSE3, WSE4 ), class = data.frame, row.names = c(NA, -8L)) fun1 - function(dat, rep){ res - replicate(rep,{ lst1 -lapply(sample(nrow(dat),nrow(dat)),function(x) sample(dat[x,2:5],4)) names(lst1) - sapply(lst1,row.names) lst1[-c(1:2)] - lapply(names(lst1)[-c(1:2)],function(i) { x1 - dat[i,6:9][is.na(match(gsub(^.,,names(dat[i,6:9])),gsub(^.,,names(lst1[[i]][1]] cbind(lst1[[i]][1], sample(x1,3)) } ) do.call(rbind,lapply(lst1,function(x) {datNew - cbind(SubID= as.numeric(row.names(x)), x); names(datNew)[-1] - var; datNew})) }) res } res1 - fun1(dat1,5) lst2 - lapply(split(res1,col(res1)), function(x) {dat - do.call(cbind,x); colnames(dat) - c(SubID, rep(var,4));dat}) do.call(cbind,res1[,1]) do.call(cbind,res1[,2]) A.K. I have a question about drawing samples from a data frame. This might sound really tricky. Let me use a data frame I have posted earlier as an example: SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 1 6 5 6 2 6 2 2 4 2 6 4 7 2 6 6 2 3 3 5 5 5 5 5 5 4 5 4 5 4 3 4 4 4 5 2 5 5 6 7 5 6 4 4 1 6 5 4 3 6 4 3 7 3 7 3 6 6 3 6 5 2 1 8 3 6 6 3 6 5 4 7 this data frame have two sets of variables. each set simply represent one scale. as shown above, the first scale, say CSE, consists of four items: CSE1, CSE2, CSE3, and CSE4, whereas the second scale, say WSE, also has four items: WSE1, WSE2, WSE3, WSE4. the leftmost column lists the subjects' ID. I wanna create a new data frame through sampling random numbers from the data frame above. Below is the structure of the new data frame. SubID var var var var s c c c c s c c c c s c w w w s c w w w s c w w w s c w w w s c w w w s c w w w in the new data frame: s= SubID range from 1 to 8 var= variables c=CSE numbers w=WSE numbers some rules to construct the new data frame: 1. the top two rows have to be filled with CSE numbers; the numbers in the cells of each row should be randomized. for example, if the first row is an array of numbers from subject 4, they can follow the order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4). Also, the numbers in the second row does not have to follow the order of the first row. for example, similarly, if the first row is an array of numbers from subject 4 in the order: 4(CSE2), 5(CSE1), 3(CSE3), and 4(CSE4), numbers in the second row (assuming it is from subject 8) does not have to be 6(CSE2), 3(CSE1), 6(CSE3), and 3(CSE4). numbers in these two rows should be drawn without replacement. 2. each of the rest of the rows should include a CSE number in the leftmost cell and three WSE numbers on the right. At the same time, in each row, the three WSE numbers on the right have to be only those numbers that are not corresponding to the CSE number in the leftmost cell. For example, if the CSE number in the leftmost cell is 4, a CSE2 number from subject 6, the three WSE numbers on the right side can only be 4(WSE1), 7(WSE3), and 3(WSE4) from subject 6. 3. the numbers in each row can only be drawn from the same subject. Also, Subjects should be randomized. Specifically, they does have to be in the following order: SubID 1 2 3 4 5 6 7 8 they can be: SubID 2 8 5 4 1 6 7 3 4. repeat the whole process 1000 times to draw 1000 random samples Any ideas? Thanks in advance!! :) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Sampling Weights and lmer() update?
Arguably you are looking in the wrong place (there's a special mixed-models mailing list for R), but I can answer the question. No. At least, there's nothing in lme4, and I haven't done anything (since I want a more general solution than Stata and MLWiN implement) and I'd be surprised if someone else had done it. -thomas On Tue, May 14, 2013 at 3:35 PM, Richard Blissett rsl.bl...@gmail.comwrote: Perhaps I am not looking in the right place, but I am looking for a way to use lmer() to run a multilevel model that incorporates sampling weights. I have used the Lumley survey package to use sampling weights in the past, but according to post I found online from Thomas Lumley in mid-2012, R is currently not equipped to be able to do this. His post is here: http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632955.html Does anyone know if there has been an update since then to be able to do this, or if there's another way to go about doing this in R? Otherwise, I am thinking that I will have to move my data over to Stata and try to run the multilevel models there. Richard -- Richard Blissett Eco-Tip: Before printing, please consider whether you really need to have this email on paper. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling data without having infinite numbers after diong a transformation
Perhaps you should read the help file for rnorm more carefully. ?rnorm Keep in mind that the normal probability distribution is a density function, so the smaller the standard deviation is, the greater the magnitude of the density function is. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Agnes Ayang agnes.ay...@yahoo.com wrote: Hello R-helpers.. I want to ask about how I can sample data sets without having the infinite numbers coming out. For example, set.seed(1234) a-rnorm(15,0,1) b-rnorm(15,0,1) c-rnorm(15,0,1) d-rnorm(15,0,36) After come out with the sample, I need to do a transformation (by Hoaglin, 1985) for each data set. Actually I need to measure the skewness and kurtosis, that's why I need to do the transformation. After transformation, there will be 'Inf' value in my data sets and I cannot proceed with the next step where I need to compute the trimmed mean and sum square of deviation. If anyone can help on how to obtain a better data sets so that my programme will work. Thank you. Best regards, Hyo Min UPM Malaysia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from a Population
Hi Lorenzo, This has the feel of a homework problem, but I will suggest to you that this is sampling without replacement and there exist easy mathematical formulas (no need to resort to R) to calculate your desired probability. Michael On Sat, Dec 8, 2012 at 11:54 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote: Dear All, I hope this is not too off topic, but I am sure it has to be a one-liner in R. Suppose you have a population of size N and that you take a random sample of n_s individuals out of this population. This population includes a subgroup of n_i individuals. For any individual in n_i, what is the probability of being included in the sample n_s? Many thanks. Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights for multilevel models
Hello, The link you've posted is to a page that does NOT have a dataset, it has links to other pages. The proper way of posting a data example would be # paste the output of this in a post dput(head(yourdata, 20)) # or 30 Now, if I understand your question, function sample() does have a weights argument, 'prob'. (Package base.) See help(sample) Hope this helps, Rui Barradas Em 10-06-2012 20:00, Tamara escreveu: Dear all, I am struggling with a problem which I have been reading on the forums about and it did not seem to me that there is a precise answer to my question. However, I still hope there is one. I am working with http://timss.bc.edu/ PIRLS data and trying to conduct multilevel analysis. There are different weights for each level of analysis in the PIRLS dataset (e.g. there is a school weight, class weight, student weight). Is there a function in R which would let me use different weights for different levels of my model? If yes, which package contains it? I would be very grateful for any help! -- View this message in context: http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights for multilevel models
On Mon, Jun 11, 2012 at 7:00 AM, Tamara petrova.t...@gmail.com wrote: Dear all, I am struggling with a problem which I have been reading on the forums about and it did not seem to me that there is a precise answer to my question. However, I still hope there is one. I am working with http://timss.bc.edu/ PIRLS data and trying to conduct multilevel analysis. There are different weights for each level of analysis in the PIRLS dataset (e.g. there is a school weight, class weight, student weight). Is there a function in R which would let me use different weights for different levels of my model? If yes, which package contains it? As far as I know there is no function that does what you want. In particular, lme() and lmer() don't work correctly with sampling weights. It does depend on why you want a multilevel model. If you are primarily interested in the mean model and the variance components are just needed to get appropriate standard errors, then you can use the svyglm() function in the survey package to fit a linear regression with appropriate standard errors. On the other hand, if you are interested in estimating the variance components for their own sake, you need some other software. I do have longer-term plans to add multilevel modelling capabilities to the survey package, but it's harder than it may appear. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights for multilevel models
Thank you very much, Rui! But I am afraid that I won't be able to use this function for multilevel analysis, as unfortunately I don't see how exactly I will combine it with functions in the R packages for multilevel analysis . -- View this message in context: http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632957.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights for multilevel models
Thank you very much, Thomas! As I need to estimate the variance components, I will most probably have to switch from R to HLM or Mplus to apply different weights to different levels. Although I prefer R in general. -- View this message in context: http://r.789695.n4.nabble.com/sampling-weights-for-multilevel-models-tp4632947p4632962.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows from a list
?? Something like: lapply(mydata, function(x){ nr - nrow(x) x[sample(seq_len(nr),nr,rep=TRUE),] }) maybe. The idea is to use the sampled rows as your row index. -- Bert On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote: Hi: I'm sure this seems like a rudimentary question, but I am not well versed with R syntax for lists. I have a ragged array from which I've removed records (entire rows) with missing data. The functions I used to remove the missing cases resulted in the generation of an R list class object, that looks something like this; mydata [[1]] [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [[2]] [,1] [,2] [,3] [1,] 10 11 12 [2,] 13 14 15 [[3]] [,1] [,2] [,3] [1,] 16 17 18 [2,] 19 20 21 [3,] 22 23 24 [4,] 25 26 27 [5,] 28 29 30 Part1 What I would like to do is draw an equal number of random row samples from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3]. Part2 Then I would like to cocerce the list object into something like an array. Help scripting out part 1 or 2 would be much appreciated. Brian Campbell -- View this message in context: http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling rows from a list
## recreating your data mydata-list(matrix(1:9, nrow=3, byrow=T), matrix(10:15, nrow=2, byrow=T), matrix(16:30, nrow=5, byrow=T)) ## get the shortest matrix in your list n - min(unlist(lapply(mydata, nrow))) ## subset the list into random samples of length n out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n) ## this structure is still a list though... ## converting directly to an array: out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out))) not totally sure about what structure you're wanting in the last step, so if i missed i apologize... Hope that helps, Justin On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote: Hi: I'm sure this seems like a rudimentary question, but I am not well versed with R syntax for lists. I have a ragged array from which I've removed records (entire rows) with missing data. The functions I used to remove the missing cases resulted in the generation of an R list class object, that looks something like this; mydata [[1]] [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 [[2]] [,1] [,2] [,3] [1,] 10 11 12 [2,] 13 14 15 [[3]] [,1] [,2] [,3] [1,] 16 17 18 [2,] 19 20 21 [3,] 22 23 24 [4,] 25 26 27 [5,] 28 29 30 Part1 What I would like to do is draw an equal number of random row samples from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3]. Part2 Then I would like to cocerce the list object into something like an array. Help scripting out part 1 or 2 would be much appreciated. Brian Campbell -- View this message in context: http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Hi, thank you but it does work for vectors and matrix but not dataframes, it gives me this message error: MeanA - read.csv(MeanAmf.csv,header=T) mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] remainder-MeanA[-mysample] Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' In Ops.factor(left) : - not meaningful for factors Any other way? -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Hi sarah, it is not clear to me how to do that, can you show me please? Imagine I have a situation like this: MeanA - read.csv(MeanAmf.csv,header=T) mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Then? -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4455921.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Hi I have only faint idea what was you problem as there is no context in you message but maybe remainder-MeanA[-mysample, ] could work. Regards Petr Hi, thank you but it does work for vectors and matrix but not dataframes, it gives me this message error: MeanA - read.csv(MeanAmf.csv,header=T) mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] remainder-MeanA[-mysample] Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' In Ops.factor(left) : - not meaningful for factors Any other way? -- View this message in context: http://r.789695.n4.nabble.com/Sampling- problems-tp4453752p4455912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Hi, thank you but it does work for vectors and matrix but not dataframes, it gives me this message error: MeanA - read.csv(MeanAmf.csv,header=T) mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Well, maybe slight correction mysample - sample(1:nrow(MeanA), 20, replace=FALSE) chosen.one-MeanA[mysample,] remainder-MeanA[-mysample,] Regards Petr remainder-MeanA[-mysample] Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' In Ops.factor(left) : - not meaningful for factors Any other way? -- View this message in context: http://r.789695.n4.nabble.com/Sampling- problems-tp4453752p4455912.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Thanks, but it doesn't work either, it gives me the same message error. It works just if my first sample is taken in this way: mysample - sample(1:nrow(MeanA), 20, replace=FALSE) However, in this way it sample just the number of rows: [1] 71 24 12 36 2 39 69 62 43 38 9 44 13 54 50 63 67 66 37 28 but not the data inside. I need to sample in this way: mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] to get a sample like this HRkmMean.mf Mean.mfm Loc Diet Terr Soc Type Soc.Ter W.cat.0.25 W.cat.0.5 -2.49-0.432.57 A OT S D TS b 23 -2.050.67 T CN SD NS A This is an example of my dataframe -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456048.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
Please use dput() to give a reproducible example: I can make this work on a data frame quite easily -- x - data.frame(1:10, letters[1:10], rnorm(10)) str(x) print(x) x[sample(nrow(x), 5), ] So it's not a problem with something being a data frame or having factors. Michael On Thu, Mar 8, 2012 at 5:16 AM, Oritteropus lucasantin...@hotmail.com wrote: Thanks, but it doesn't work either, it gives me the same message error. It works just if my first sample is taken in this way: mysample - sample(1:nrow(MeanA), 20, replace=FALSE) However, in this way it sample just the number of rows: [1] 71 24 12 36 2 39 69 62 43 38 9 44 13 54 50 63 67 66 37 28 but not the data inside. I need to sample in this way: mysample - MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] to get a sample like this HRkm Mean.mf Mean.mfm Loc Diet Terr Soc Type Soc.Ter W.cat.0.25 W.cat.0.5 -2.49 -0.43 2.57 A O T S D TS b 23 -2.05 0.67 T C N S D NS A This is an example of my dataframe -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456048.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
You could make a vector containing the number of TRUE values that makes up 80% of your data, and the number of FALSE values that makes up 20% of your data. Use sample() to reorder it, then use it to divide your dataset. If you had provided a reproducible example, I could write you code. Sarah On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus lucasantin...@hotmail.com wrote: Hi, I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Alternatively, I was thinking to something like setdiff () function to compare my 80% sample to the original dataset and obtain the corresponding 20%, unfortunately setdiff works just for vectors, do you know a similar function for dataframes? Thanks -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus wrote: Hi, I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Hi. If you use sample() to get the 80% and store the indices, you can also get the remaining cases a - matrix(1:30, ncol=3) i - sample(10, 8) a[sort(i), ] [,1] [,2] [,3] [1,]1 11 21 [2,]2 12 22 [3,]3 13 23 [4,]4 14 24 [5,]6 16 26 [6,]7 17 27 [7,]8 18 28 [8,] 10 20 30 a[-i, ] [,1] [,2] [,3] [1,]5 15 25 [2,]9 19 29 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problems
On Mar 7, 2012, at 11:41 AM, Oritteropus wrote: Hi, I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Alternatively, I was thinking to something like setdiff () function to compare my 80% sample to the original dataset and obtain the corresponding 20%, unfortunately setdiff works just for vectors, do you know a similar function for dataframes? Create an index vector with runif or sample and then use that to get you sample and use negative indexing to get the remainder. idx - sample(1:1000, 800) x[ idx, ] # 80% x[ -idx, ] # the other 20% (I think this does presume you have not mucked with the default rownames.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with Constraints for testing and training data
Hi People, Does anyone have a good solution for this problem: a database called DB. index - sample(1:nrow(DB), size=0.2*nrow(BD)) test - DB[index,] train - DB[-index,] One of the variables in this database contais a target variable with two values 0 and 1. Imagine now that i want to constraint the test data frame so the 20% of the size of test has 50% of DB$target. Imagine: n=100 DB$target = { 0=80 1=20} test=20 and contain 10 random values of DB$target=1 and 10 random values of DB$target=0. Many Thanks, Eliano -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-Constraints-for-testing-and-training-data-tp4325530p4327028.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with Constraints for testing and training data
On Wed, Jan 25, 2012 at 04:00:27AM -0800, Eliano wrote: Hi People, Does anyone have a good solution for this problem: a database called DB. index - sample(1:nrow(DB), size=0.2*nrow(BD)) test - DB[index,] train - DB[-index,] One of the variables in this database contais a target variable with two values 0 and 1. Imagine now that i want to constraint the test data frame so the 20% of the size of test has 50% of DB$target. Imagine: n=100 DB$target = { 0=80 1=20} test=20 and contain 10 random values of DB$target=1 and 10 random values of DB$target=0. Hi. One way is as follows. t0 - which(DB$target==0) t1 - which(DB$target==1) m - round(0.1*nrow(DB)) stopifnot(length(t0) = m length(t1) = m) index - c(sample(t0, size=m), sample(t1, size=m)) Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights in package lme4
On Tue, Jan 24, 2012 at 6:19 PM, Mohd masood masood0...@rediffmail.com wrote: Dear All I am trying to include sampling weights in multilavel regression analysis using packege lme4 using following codes print(fm1 lt;- lmer(DC~sex+age+smoker+alcohol+fruits(1|setting), dataset,REML = FALSE), corr = FALSE) print(fm2 lt;- lmer(DC~sex+age+smoker+alcohol+fruits(1|setting), dataset,REML = FALSE), corr = FALSE,weights=sweight) The problem is both the codesnbsp;givingnbsp;me exactly the same results.is this weights not meant for sampling weights?if not, how can i include sampling weights in lme4? It's not meant for sampling weights. It's meant for precision weights. How best to include sampling weights in mixed models is a research problem at the moment, but you can rely on getting the wrong answer if you just use the weights= argument. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling weights in package lme4
On Jan 24, 2012, at 20:41 , Thomas Lumley wrote: It's not meant for sampling weights. It's meant for precision weights. How best to include sampling weights in mixed models is a research problem at the moment, but you can rely on getting the wrong answer if you just use the weights= argument. -thomas Fortune nomination! -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling data every third hour
1) Use dput() to submit data. 2) Would this work? (It requires your data are evenly spaced, but I think that's it) d[seq(1, nrow(d), by = 3), ] Michael On Wed, Dec 14, 2011 at 7:17 AM, abcdef ghijk lineh...@yahoo.com wrote: Good Morning , I want to sample the following time series for every third hour. For example at 00:00,03:00,06:00,09:00 etc. 2011-01-01 00:00:00 0.00e+00 2011-01-01 01:00:00 1.471667e+01 2011-01-01 02:00:00 1.576667e+01 2011-01-01 03:00:00 0.00e+00 2011-01-01 04:00:00 0.00e+00 2011-01-01 05:00:00 0.00e+00 2011-01-01 06:00:00 0.00e+00 2011-01-01 07:00:00 0.00e+00 2011-01-01 08:00:00 0.00e+00 2011-01-01 09:00:00 1.826667e+01 2011-01-01 10:00:00 0.00e+00 2011-01-01 11:00:00 0.00e+00 2011-01-01 12:00:00 0.00e+00 2011-01-01 13:00:00 0.00e+00 2011-01-01 14:00:00 0.00e+00 2011-01-01 15:00:00 0.00e+00 2011-01-01 16:00:00 0.00e+00 2011-01-01 17:00:00 0.00e+00 2011-01-01 18:00:00 0.00e+00 2011-01-01 19:00:00 0.00e+00 2011-01-01 20:00:00 0.00e+00 2011-01-01 21:00:00 7.01e+01 2011-01-01 22:00:00 7.154167e+02 2011-01-01 23:00:00 2.039167e+02 2011-01-02 00:00:00 3.703000e+02 2011-01-02 01:00:00 9.130167e+02 2011-01-02 02:00:00 0.00e+00 2011-01-02 03:00:00 0.00e+00 Thanks in advance. Regards, Shan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
[Yet another correction -- this one is important. I start from scratch this time] On 07-Nov-11 22:22:54, SarahJoyes wrote: Hey everyone, I am at best, an amateur user of R, but I am stuck on how to set-up the following situation. I am trying to select a random sample of numbers from 0 to 10 and insert them into the first column of a matrix (which will used later in a loop). However, I need to have those numbers add up to 10. How can I set those conditions? So far I have: n-matrix(0,nr=5,ncol=10) for(i in 1:10){n[i,1]-sample(0:10,1)} How do I set-up the BUT sum(n[i,1])=10? Thanks SarahJ Sarah, your example is confusing because you have set up a matrix 'n' with 5 rows and 10 columns. But your loop cycles through 10 rows! However, assuming that your basic requirement is to sample 10 integers which add up to 10, consider rmultinom(): ### Instead of: rmultinom(n=1,size=10,prob=(1:10)/10) ### rmultinom(n=1,size=10,prob=rep(1,10)/10) # [,1] # [1,]1 # [2,]0 # [3,]2 # [4,]3 # [5,]1 # [6,]1 # [7,]0 # [8,]0 # [9,]1 #[10,]1 rmultinom(n=1,size=10,prob=rep(1,10)/10) # [,1] # [1,]2 # [2,]0 # [3,]1 # [4,]1 # [5,]2 # [6,]2 # [7,]1 # [8,]0 # [9,]1 #[10,]0 This gives a uniform distribution over the positions in the sample vector for the sampled integers, so that all permutations are equally likely. For a non-uniform distribution, vary 'prob'. Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 08-Nov-11 Time: 08:13:36 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
Sorry about being confusing, I have so many loops in loops and ifelses that I get mixed up sometimes, it was just a typo, it was supposed to be for(i in 1:5) Sorry, Thanks for you help! SJ -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4016058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
That is exactly what I want, and it's so simple! Thanks so much! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4016050.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of SarahJoyes Sent: Tuesday, November 08, 2011 5:57 AM To: r-help@r-project.org Subject: Re: [R] Sampling with conditions That is exactly what I want, and it's so simple! Thanks so much! Sarah, I want to point out that my post was qualified by something like. I am not sure it is exactly what you want. Since you didn't quote my post, let me show my suggestion and then express my concern. n - matrix(0,nrow=5, ncol=10) repeat{ c1 - sample(0:10, 4, replace=TRUE) if(sum(c1) = 10) break } n[,1] - c(c1,10-sum(c1)) n This nominally meets your criteria, but it will tend to result in larger digits being under-represented. For example, you unlikely to get a result like c(0,8,0,0,2) or (9,0,0,1,0). That may be OK for your purposes, but I wanted to point it out. You could use something like n - matrix(0,nrow=5, ncol=10) c1 - rep(0,4) for(i in 1:4){ upper - 10-sum(c1) c1[i] - sample(0:upper, 1, replace=TRUE) if(sum(c1) == 10) break } n[,1] - c(c1,10-sum(c1)) n if that would suit your purposes better. Good luck, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
In addition to Dan's quite valid concern, the final sample is not truly 'random' - the first k - 1 elements are randomly chosen, but the last is determined so that the constraint is met. Dennis On Tue, Nov 8, 2011 at 9:59 AM, Nordlund, Dan (DSHS/RDA) nord...@dshs.wa.gov wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of SarahJoyes Sent: Tuesday, November 08, 2011 5:57 AM To: r-help@r-project.org Subject: Re: [R] Sampling with conditions That is exactly what I want, and it's so simple! Thanks so much! Sarah, I want to point out that my post was qualified by something like. I am not sure it is exactly what you want. Since you didn't quote my post, let me show my suggestion and then express my concern. n - matrix(0,nrow=5, ncol=10) repeat{ c1 - sample(0:10, 4, replace=TRUE) if(sum(c1) = 10) break } n[,1] - c(c1,10-sum(c1)) n This nominally meets your criteria, but it will tend to result in larger digits being under-represented. For example, you unlikely to get a result like c(0,8,0,0,2) or (9,0,0,1,0). That may be OK for your purposes, but I wanted to point it out. You could use something like n - matrix(0,nrow=5, ncol=10) c1 - rep(0,4) for(i in 1:4){ upper - 10-sum(c1) c1[i] - sample(0:upper, 1, replace=TRUE) if(sum(c1) == 10) break } n[,1] - c(c1,10-sum(c1)) n if that would suit your purposes better. Good luck, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
Dan Nordlund, Dan (DSHS/RDA) wrote: -Original Message- From: r-help-bounces@ [mailto:r-help-bounces@r- project.org] On Behalf Of SarahJoyes Sent: Tuesday, November 08, 2011 5:57 AM To: r-help@ Subject: Re: [R] Sampling with conditions That is exactly what I want, and it's so simple! Thanks so much! Sarah, I want to point out that my post was qualified by something like. I am not sure it is exactly what you want. Since you didn't quote my post, let me show my suggestion and then express my concern. n - matrix(0,nrow=5, ncol=10) repeat{ c1 - sample(0:10, 4, replace=TRUE) if(sum(c1) = 10) break } n[,1] - c(c1,10-sum(c1)) n This nominally meets your criteria, but it will tend to result in larger digits being under-represented. For example, you unlikely to get a result like c(0,8,0,0,2) or (9,0,0,1,0). That may be OK for your purposes, but I wanted to point it out. You could use something like n - matrix(0,nrow=5, ncol=10) c1 - rep(0,4) for(i in 1:4){ upper - 10-sum(c1) c1[i] - sample(0:upper, 1, replace=TRUE) if(sum(c1) == 10) break } n[,1] - c(c1,10-sum(c1)) n if that would suit your purposes better. Good luck, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Perhaps a little bit of context may be helpful, I am trying to figure out the ideal age structure for a population of ten individuals that would yield the best overall survival rate given that each age group has different survivorbility and different reproductive rates. So yes, having a bias for smaller numbers would be a problem. The only other problem that I see with your revised code is that there will be a bias towards having higher numbers in the first age group or first row of the column... The other idea I was playing with was to create a series of ifelse statements for each row of the column... Something like: n-matrix(0,nr=5,ncol=10) n[1,1]-sample(0:10,1) n[2,1]-ifelse(n[1,1]=10,0,sample(0:10,1)) n[3,1]-ifelse(sum(n[i,1])10,0,sample(0:10,1)) etc... I still think that might be biased towards high numbers in the first rows though... hmmm SJ -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4017351.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
Not sure this is valid that you can have 9 random samples out of 10, but the last one has to be fixed to meet the restraint, sum=10. Weidong On Mon, Nov 7, 2011 at 5:22 PM, SarahJoyes sjo...@uoguelph.ca wrote: Hey everyone, I am at best, an amateur user of R, but I am stuck on how to set-up the following situation. I am trying to select a random sample of numbers from 0 to 10 and insert them into the first column of a matrix (which will used later in a loop). However, I need to have those numbers add up to 10. How can I set those conditions? So far I have: n-matrix(0,nr=5,ncol=10) for(i in 1:10){n[i,1]-sample(0:10,1)} How do I set-up the BUT sum(n[i,1])=10? Thanks SarahJ -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-conditions-tp4014036p4014036.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of SarahJoyes Sent: Monday, November 07, 2011 2:23 PM To: r-help@r-project.org Subject: [R] Sampling with conditions Hey everyone, I am at best, an amateur user of R, but I am stuck on how to set-up the following situation. I am trying to select a random sample of numbers from 0 to 10 and insert them into the first column of a matrix (which will used later in a loop). However, I need to have those numbers add up to 10. How can I set those conditions? So far I have: n-matrix(0,nr=5,ncol=10) for(i in 1:10){n[i,1]-sample(0:10,1)} How do I set-up the BUT sum(n[i,1])=10? Thanks SarahJ Sarah, Does something like this do what you want? n - matrix(0,nrow=5, ncol=10) repeat{ c1 - sample(0:10, 4, replace=TRUE) if(sum(c1) = 10) break } n[,1] - c(c1,10-sum(c1)) n Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
On 07-Nov-11 22:22:54, SarahJoyes wrote: Hey everyone, I am at best, an amateur user of R, but I am stuck on how to set-up the following situation. I am trying to select a random sample of numbers from 0 to 10 and insert them into the first column of a matrix (which will used later in a loop). However, I need to have those numbers add up to 10. How can I set those conditions? So far I have: n-matrix(0,nr=5,ncol=10) for(i in 1:10){n[i,1]-sample(0:10,1)} How do I set-up the BUT sum(n[i,1])=10? Thanks SarahJ Sarah, your example is confusing because you have set up a matrix 'n' with 5 rows and 10 columns. But your loop cycles through 10 rows! However, assuming that your basic requirement is to sample 10 integers which add up to 10, consider rmultinom(): rmultinom(n=1,size=10,prob=(1:10)/10) # [,1] # [1,]1 # [2,]0 # [3,]2 # [4,]0 # [5,]1 # [6,]1 # [7,]2 # [8,]0 # [9,]1 #[10,]2 rmultinom(n=1,size=10,prob=(1:10)/10) # [,1] # [1,]0 # [2,]0 # [3,]0 # [4,]0 # [5,]1 # [6,]1 # [7,]2 # [8,]1 # [9,]2 #[10,]3 This gives each integer in (0:10) equal chances of being in the sample. For unequal chances, vary 'prob'. Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 08-Nov-11 Time: 00:25:54 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with conditions
[Correction below (I was writing too late at night ... ] On 08-Nov-11 00:25:57, Ted Harding wrote: On 07-Nov-11 22:22:54, SarahJoyes wrote: Hey everyone, I am at best, an amateur user of R, but I am stuck on how to set-up the following situation. I am trying to select a random sample of numbers from 0 to 10 and insert them into the first column of a matrix (which will used later in a loop). However, I need to have those numbers add up to 10. How can I set those conditions? So far I have: n-matrix(0,nr=5,ncol=10) for(i in 1:10){n[i,1]-sample(0:10,1)} How do I set-up the BUT sum(n[i,1])=10? Thanks SarahJ Sarah, your example is confusing because you have set up a matrix 'n' with 5 rows and 10 columns. But your loop cycles through 10 rows! However, assuming that your basic requirement is to sample 10 integers which add up to 10, consider rmultinom(): rmultinom(n=1,size=10,prob=(1:10)/10) # [,1] # [1,]1 # [2,]0 # [3,]2 # [4,]0 # [5,]1 # [6,]1 # [7,]2 # [8,]0 # [9,]1 #[10,]2 rmultinom(n=1,size=10,prob=(1:10)/10) # [,1] # [1,]0 # [2,]0 # [3,]0 # [4,]0 # [5,]1 # [6,]1 # [7,]2 # [8,]1 # [9,]2 #[10,]3 This gives each integer in (0:10) equal chances of being in the sample. For unequal chances, vary 'prob'. Hoping this helps, Ted. That should have read: This gives a uniform distribution over the positions in the sample vector for the sampled integers, so that all permutations are equally likely. For a non-uniform distribution, vary 'prob'. Sorry, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 08-Nov-11 Time: 07:40:51 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling from the multivariate truncated normal
Well, for 0.828324 x[2] Inf the probablility is roughly 0 hence not easy to draw random numbers out there Uwe Ligges How is this probability roughly 0? -- View this message in context: http://r.789695.n4.nabble.com/sampling-from-the-multivariate-truncated-normal-tp3626438p3647039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling from the multivariate truncated normal
On 26.06.2011 21:26, statfan wrote: I am trying generate a sample for a truncated multivariate normal distribution via the rtmvnorm function in the {tmvtnorm} package. Why does the following produce NaNs? rtmvnorm(1, mean = rep(0, 2), matrix(c(0.06906084, -0.07463565, -0.07463565, 0.08078086),2),c(-0.4316738, 0.8283240), c(Inf,Inf), algorithm=gibbsR, burn.in.samples=100) Well, for 0.828324 x[2] Inf the probablility is roughly 0 hence not easy to draw random numbers out there Uwe Ligges Thanks -- View this message in context: http://r.789695.n4.nabble.com/sampling-from-the-multivariate-truncated-normal-tp3626438p3626438.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling design runs with no errors but returns empty data set
On Thu, Mar 31, 2011 at 4:01 AM, Simon Kiss sjk...@gmail.com wrote: Dear colleagues, I'm working with the 2008 Canada Election Studies (http://www.queensu.ca/cora/_files/_CES/CES2008.sav.zip), trying to construct a weighted national sample using the survey package. Three weights are included in the national survey (a household weight, a provincial weight and a national weight which is a product of the first two). In the following code I removed variables with missing national weights and tried to construct the sample from advice I've gleaned from the documentation for the survey package and other help requests. There are no errors, but the data frame (weight_test) contains no What am I missing? Yours, Simon Kiss P.S. The code is only reproducible if the data set is downloadable. I'm nt sure ces-read.spss(file.choose(), to.data.frame=TRUE, use.value.labels=FALSE) missing_data-subset(ces1, !is.na(ces08_NATWGT)) weight_test-svydesign(id=~0, weights=~ces08_NATWGT, data=missing_data) The code isn't reproducible even with the data. The code refers to a data frame ces1, which isn't defined, and to a variable ces08_NATWGT that isn't in the data set. However, a bit of Googling suggests that the variable CES08_NA is probably the one you mean, giving the following code library(survey) library(foreign) ces-read.spss(CES2008.sav, to.data.frame=TRUE, use.value.labels=FALSE) missing_data-subset(ces, !is.na(CES08_NA)) weight_test-svydesign(id=~0, weights=~CES08_NA, data=missing_data) which seems to produce a perfectly reasonable survey design object. weight_test Independent Sampling design (with replacement) svydesign(id = ~0, weights = ~CES08_NA, data = missing_data) dim(weight_test) [1] 3257 531 svymean(~factor(GENDER),weight_test) mean SE factor(GENDER)1 0.47362 0.01 factor(GENDER)5 0.52638 0.01 Since you don't say how you concluded the object contained no, I don't know what you were seeing. Note that weight_test is not supposed to be a data frame. It's a survey design object. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
Hi , what about split function ? ?split divided x into 2 data.frame a-split(x,1:2) a[[1]] first data frame a[[2]] second data frame regrads M Le 17/02/11 05:35, yf a écrit : I want to sample from the ID. For each ID, i want to have 2 set of data. I try the sample() function but it didn't work. x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) x id v1 V2 1 1 1 12 2 1 2 13 3 1 3 14 4 2 4 15 5 2 5 16 6 2 6 17 7 2 7 18 8 3 8 19 9 3 9 20 10 3 10 21 11 4 11 22 12 4 12 23 -- Mohamed Lajnef,IE INSERM U955 eq 15# Pôle de Psychiatrie# Hôpital CHENEVIER # 40, rue Mesly # 94010 CRETEIL Cedex FRANCE # mohamed.laj...@inserm.fr # tel : 01 49 81 31 31 (poste 18467) # Sec : 01 49 81 32 90 # fax : 01 49 81 30 99 # [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
On Feb 16, 2011, at 11:35 PM, yf wrote: I want to sample from the ID. For each ID, i want to have 2 set of data. I try the sample() function but it didn't work. You don't say _how_ you used the sample function. You should show what code you used when stating the _something_ doesn't work. Sample returns a vector of items from objects where length() represents some sensible notion. It does not sample a complex object such as a dataframe. For dataframes, length is the number of columns, which doesn't agree very well with most people's notion of cases from which to sample. For selection of rows of a dataframes you need to first create a vector of numeric indices and then use that with [ idx - sample(nrow(x), nrow(x)/2) # A random split x[ idx, ] x[ -idx, ] x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) x id v1 V2 1 1 1 12 2 1 2 13 3 1 3 14 4 2 4 15 5 2 5 16 6 2 6 17 7 2 7 18 8 3 8 19 9 3 9 20 10 3 10 21 11 4 11 22 12 4 12 23 -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
But i need for each id have two data. Like... x id v1 V2 1 1 1 12 2 1 2 13 4 2 4 15 5 2 5 16 8 3 8 19 9 3 9 20 11 4 11 22 12 4 12 23 So should write sample( if sample id 2 ,2). I don't know how to write (if sample id 2). Thanks. -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
This is, maybe, not the best solution but I hope it will help you: x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) do.call(rbind,by(x,x$id,function(x) x[c(sample(nrow(x),2)),])) Andrija On Thu, Feb 17, 2011 at 6:39 PM, yf chang...@umn.edu wrote: But i need for each id have two data. Like... x id v1 V2 1 1 1 12 2 1 2 13 4 2 4 15 5 2 5 16 8 3 8 19 9 3 9 20 11 4 11 22 12 4 12 23 So should write sample( if sample id 2 ,2). I don't know how to write (if sample id 2). Thanks. -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
On Feb 17, 2011, at 1:33 PM, andrija djurovic wrote: This is, maybe, not the best solution but I hope it will help you: x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) do.call(rbind,by(x,x$id,function(x) x[c(sample(nrow(x),2)),])) Andrija Another way (and note that by is just a wrppare for tapply): tapply(1:nrow(x), x$id, sample, 2) $`1` [1] 2 3 $`2` [1] 5 4 $`3` [1] 10 8 $`4` [1] 11 12 x[unlist( tapply(1:nrow(x), x$id, sample, 2) ), ] id v1 V2 2 1 2 13 3 1 3 14 5 2 5 16 6 2 6 17 9 3 9 20 8 3 8 19 12 4 12 23 11 4 11 22 On Thu, Feb 17, 2011 at 6:39 PM, yf chang...@umn.edu wrote: But i need for each id have two data. Like... x id v1 V2 1 1 1 12 2 1 2 13 4 2 4 15 5 2 5 16 8 3 8 19 9 3 9 20 11 4 11 22 12 4 12 23 So should write sample( if sample id 2 ,2). I don't know how to write (if sample id 2). Thanks. -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3311253.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling
Hi: A couple more approaches to consider: # Utility function to extract two rows from a data frame # Meant to be applied to each data subset sampler - function(d) if(nrow(d) 2) d[sample(1:nrow(d), 2, replace = FALSE), ] else d library(plyr) ddply(x, 'id', sampler) id v1 V2 1 1 2 13 2 1 1 12 3 2 4 15 4 2 6 17 5 3 8 19 6 3 10 21 7 4 11 22 8 4 12 23 library(data.table) dtx - data.table(x, key = 'id') dtx[, sampler(.SD), by = 'id'] id v1 V2 [1,] 1 1 12 [2,] 1 3 14 [3,] 2 5 16 [4,] 2 7 18 [5,] 3 9 20 [6,] 3 10 21 [7,] 4 11 22 [8,] 4 12 23 HTH, Dennis On Wed, Feb 16, 2011 at 8:35 PM, yf chang...@umn.edu wrote: I want to sample from the ID. For each ID, i want to have 2 set of data. I try the sample() function but it didn't work. x-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) x id v1 V2 1 1 1 12 2 1 2 13 3 1 3 14 4 2 4 15 5 2 5 16 6 2 6 17 7 2 7 18 8 3 8 19 9 3 9 20 10 3 10 21 11 4 11 22 12 4 12 23 -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from multi-dimensional kernel density estimation
Generating new data from a kernel density estimate is equivalent to choosing a point from your data at random, then generating a point from your kernel centered at the chosen point. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Christoph Goebel Sent: Friday, November 19, 2010 1:56 PM To: r-help@r-project.org Subject: [R] Sampling from multi-dimensional kernel density estimation Hi, I'd like to use a three-dimensional dataset to build a kernel density and then sample from the distribution. I already used the npudens function in the np package to estimate the density and plot it: fit-npudens(~x+y+z) plot(fit) It takes some time but appears to work well. How can I use this to evaluate the fitted function at a certain point, e.g. (x=1, y=1, z=1)? Does R provide methods for sampling from the fitted function? Thanks, Christoph [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problem
Michael, I really appreciate your help. but I got the following error message when I wan trying to run the function written by you: Error in out[i, ] - apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : number of items to replace is not a multiple of replacement length I am not quite sure why would this happen. As a novice of R, these functions are kinda complex for me. I am wondering if it is doable without using loops like that. Again, thank you so much!!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3044249.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problem
On 16 November 2010 16:10, wangwallace talentt...@gmail.com wrote: Michael, I really appreciate your help. but I got the following error message when I wan trying to run the function written by you: Error in out[i, ] - apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : number of items to replace is not a multiple of replacement length Did the data.frame or matrix you were sampling have the same general form as the example you posted previously ? Can you give me a small example that causes the error ? I am not quite sure why would this happen. As a novice of R, these functions are kinda complex for me. I am wondering if it is doable without using loops like that. I wasn't sure exactly what you wanted so the function was meant to be general and easy to modify. It is often possible to use constructs other than loops in R, though that doesn't mean the code will always be either faster or clearer. But you'll need to describe your requirements in more precise terms (short, clear examples are good) for folks here to suggest methods. Again, thank you so much!!! No worries. If you can provide an example that generates the error we should be able to get further. Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problem
yes, the data.frame is exactly the same as the one I posted earlier. I was trying to see if the loop function works. And I got that message. here below is the syntax I was trying to run, followed by the error message at the end: sampleX-function(X,nGrp1,nsamples){if(nGrp1=4)stop(can't sample all group 1 variables) + out-matrix(0,nsamples,nGrp1+1) + for(i in 1:nsamples){ + grp1-sample(4,nGrp1) + grp2-sample((1:4)[-grp1],) + out[i,]-apply(X[,c(grp1+1,grp2+5)],2,sample,1) + } + out} sampleX(help,1,10) Error in out[i, ] - apply(X[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : number of items to replace is not a multiple of replacement length By the way, it is only a small piece of my data set, which has 12 variables (or columns) for each group (grp1: CSE1, CSE2, CSE3, CSE4, CSE5, CSE6, CSE7, CSE8, CSE9, CSE10, CSE11, CSE12; grp2: WSE1, WSE2, WSE3, WSE4, WSE5, WSE6, WSE7, WSE8, WSE9, WSE10, WSE11, WSE12). I will draw 1000 random samples for each of the 11 different combinations below: combination 1: 1 variable from grp1 + 11 variables from grp2 = 12 variables combination 2: 2 variable from grp1 + 10 variables from grp2 = 12 variables combination 3: 3 variable from grp1 + 9 variables from grp2 = 12 variables combination 4: 4 variable from grp1 + 8 variables from grp2 = 12 variables combination 5: 5 variable from grp1 + 7 variables from grp2 = 12 variables combination 6: 6 variable from grp1 + 6 variables from grp2 = 12 variables combination 7: 7 variable from grp1 + 5 variables from grp2 = 12 variables combination 8: 8 variable from grp1 + 4 variables from grp2 = 12 variables combination 9: 9 variable from grp1 + 3 variables from grp2 = 12 variables combination 10: 10 variable from grp1 + 2 variables from grp2 = 12 variables combination 11: 11 variable from grp1 + 1 variables from grp2 = 12 variables As shown above, the sum of the variables in each combination will have to be 12. Also, I want to restrict a vector I am going to sample from to only those columns that are not correspond to grp1 variables I have sampled. For example, if I sampled 1 variable, say CSE1, from grp1, the other 11 variables from grp2 should not include WSE1; if I sampled 2 variables, say CSE1 and CSE2, from grp1, the other 10 variables from grp2 should not include WSE1 and WSE2. Anyway, this is a lot more complicated example than the one I described in my first post. But I think I can modify your function if I wanna apply it to the large data set with 12 variables for each group, since they basically share the same method. Now I am wondering where the error message is from. Again, thanks!! :) -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3045095.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling problem
Hello, Is this what you want ? sampleX - function(X, nGrp1, nsamples) # X is matrix or data.frame with cols for two groups of variables # with grp1 in cols 2:5 and grp2 in cols 6:9 # # nGrp1 - number of variables to sample from group 1 # # nsamples - number of rows in output matrix if (nGrp1 = 4) stop(can't sample all group 1 variables) out - matrix(0, nsamples, nGrp1+1) for (i in 1:nsamples) { # choose grp1 vars to sample grp1 - sample(4, nGrp1) # choose complentary grp2 var to sample grp2 - sample((1:4)[-grp1], 1) # sample 1 value from each var out[i, ] - apply(X[,c(grp1+1, grp2+5)], 2, sample, 1) } out } Michael On 16 November 2010 07:59, wangwallace talentt...@gmail.com wrote: Hey, I am hoping someone can help me with a sampling question. I have a data frame of 8 variables (the first column is the subjects' id): SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 1 6 5 6 2 6 2 2 4 2 6 4 7 2 6 6 2 3 3 5 5 5 5 5 5 4 5 4 5 4 3 4 4 4 5 2 5 5 6 7 5 6 4 4 1 6 5 4 3 6 4 3 7 3 7 3 6 6 3 6 5 2 1 8 3 6 6 3 6 5 4 7 the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and CSE4 in one group and the rest in another group. sample(data[,2:4],2,replace=FALSE) CSE1 CSE2 1 6 5 2 6 4 3 5 5 4 5 4 5 5 6 6 5 4 7 3 6 8 3 6 Now I want to sample 1 column from another group of variables (i.e., WSE1, WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from to only those columns that are not correspond to GROUP 1 variables I have sampled. That is, I want to sample a column from WSE3, WSE4 Columns corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. How can I do this? what if I want to repeat this whole process (drawing 2 random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas? Many thanks in advance!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling from normal
Hi Solafah, You are right that two commands are equivalent when p= pnorm(a). You can check the results by following codes. n - 5 a - -1 set.seed(123456) qnorm(runif(n,0,pnorm(a))) p - pnorm(a) set.seed(123456) qnorm(p*runif(n)) Anyway, the elements of the lower tail are not chosen equally by this method. I may try another method. Such like: s1 - rnorm(1) n - 5 a - -1 sample(s1[s1a],n) - A R learner. -- View this message in context: http://r.789695.n4.nabble.com/sampling-from-normal-tp3003016p3003164.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from data set
We'll probably need much more info, but this should get you started: nameOfDataSet[sample(1:1, 100),] You can replace the 1 with dim(nameOfDataSet)[1] to make it more dynamic. Jeff. On Tue, Oct 5, 2010 at 3:07 AM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear all. I have data with 2 variable x,y size 1. I want to sampling from this data with size 100. How I can do it. THANK. -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from data set
If poproh.3 was your dataset as a data.frame (an object with row and column dimensions), you need a comma following the row selection (sample(...)) to indicate that you want to select those rows and all columns: newsample -poprho.3[sample(1:1,100),] # note the last comma in the brackets General use is: my.data.frame[rows,columns] Where either rows or columns (or both) can be left blank to indicate that you want all of them. Similarly, a selection of the first column would have been (comma followed by column number): newsample -poprho.3[sample(1:1,100),1] That's why your: newsample -as.matrix(nameofdataset[sample(1:1,100),]) worked; the as.matrix wasn't necessary to simply sample the data. Cheers, Jeff. On Tue, Oct 5, 2010 at 3:54 AM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear Jeffrey. I used newsample -as.matrix(nameofdataset[sample(1:1,100),]). Now it include all 2 variable. Thank you for your answer to inspire. Jumlong 2010/10/5 Jeffrey Spies jsp...@virginia.edu We'll probably need much more info, but this should get you started: nameOfDataSet[sample(1:1, 100),] You can replace the 1 with dim(nameOfDataSet)[1] to make it more dynamic. Jeff. On Tue, Oct 5, 2010 at 3:07 AM, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear all. I have data with 2 variable x,y size 1. I want to sampling from this data with size 100. How I can do it. THANK. -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling from normal distribution
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of solafah bh Sent: Sunday, October 03, 2010 3:39 PM To: R help mailing list Subject: [R] sampling from normal distribution Hello If i want to resampl from the tails of normal distribution , are these commans equivelant?? upper tail:qnorm(runif(n,pnorm(b),1)) if b is an upper tail boundary or upper tail:qnorm((1-p)+p(runif(n)) if p is the probability of each interval (the observatins are divided to intervals) Regards Yes, they are equivalent, although the second formula is missing a closing parenthesis and a multiplication operator. You could also simplify the second formula to qnorm(1-p*runif(n)) Hope this is helpful, Dan Daniel Nordlund Bothell __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling from normal distribution
On 03/10/2010 6:38 PM, solafah bh wrote: Hello If i want to resampl from the tails of normal distribution , are these commans equivelant?? upper tail:qnorm(runif(n,pnorm(b),1)) if b is an upper tail boundary or upper tail:qnorm((1-p)+p(runif(n)) if p is the probability of each interval (the observatins are divided to intervals) You don't say how far up in the tail you are going, but if b is very large, you have to watch out for rounding error. For example, with b=10, pnorm(b) will be exactly equal to 1, and both versions will fail. In general for b 0 you'll get a bit more accuracy by sampling from the lower tail using -b. For really extreme cases you will probably need to switch to a log scale. For example, to get a random sample from a normal, conditional on being larger than 20, you'd want something like n - 10 logp1 - pnorm(-20, log=TRUE) logprobs - log(runif(n)) + logp1 -qnorm(logprobs, log=TRUE) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling one random frame from each unique trial?
Hi: Try this: do.call(rbind, lapply(split(h, h$file), function(x) x[sample(1:nrow(x), 1), ])) My test returns file time_pred distance_1 distance_2 12.03.08_ins_odo_01 12.03.08_ins_odo_01 210 19.003 18.023 12.03.08_ins_odo_02 12.03.08_ins_odo_0290 13.668 12.950 12.03.08_ins_odo_03 12.03.08_ins_odo_03 120 21.220 26.370 12.03.08_ins_odo_07 12.03.08_ins_odo_07 180 16.301 19.976 distance_3 12.03.08_ins_odo_01 14.666 12.03.08_ins_odo_02 13.506 12.03.08_ins_odo_03 23.962 12.03.08_ins_odo_07 25.309 The function does the following: (1) Splits the data frame into a list, where each component of the list is a sub-data frame. (2) Applies the (anonymous) sampling function to each list component (lapply) (3) Combines the individual outputs together using the rbind function (do.call) Since this is the raison d'etre of the plyr package, one can also use library(plyr) ddply(d, 'file', function(x) x[sample(1:nrow(x), 1), ]) file time_pred distance_1 distance_2 distance_3 1 12.03.08_ins_odo_01 270 15.694 9.285 4.135 2 12.03.08_ins_odo_02 270 17.252 18.235 18.661 3 12.03.08_ins_odo_03 240 18.117 19.111 19.870 4 12.03.08_ins_odo_0790 19.790 23.276 18.678 (Your results may vary, but you do get one row per file as output.) HTH, Dennis On Sun, Jun 27, 2010 at 6:16 PM, Kristiina Hurme kristiina.hu...@uconn.eduwrote: hello everyone. please bear with me if this is very easy... I have a data set with many trials, and frames within each trial. I would like to pull out one random frame from each trial. here is an example. i have 4 unique trials (file), and various frames within each (time_pred). I would like to randomly sample 4 rows, but 1 from each trial (file). this sample data is called h file time_pred distance_1 distance_2 distance_3 1 12.03.08_ins_odo_01 210 19.003 18.023 14.666 2 12.03.08_ins_odo_01 240 23.905 20.087 17.266 3 12.03.08_ins_odo_01 270 15.694 9.285 4.135 4 12.03.08_ins_odo_02 0 22.142 16.061 14.776 5 12.03.08_ins_odo_0230 2.968 12.533 19.696 6 12.03.08_ins_odo_0260 6.175 17.701 20.198 7 12.03.08_ins_odo_0290 13.668 12.950 13.506 8 12.03.08_ins_odo_02 120 7.098 17.817 22.878 9 12.03.08_ins_odo_02 270 17.252 18.235 18.661 10 12.03.08_ins_odo_02 300 7.967 15.944 8.130 11 12.03.08_ins_odo_0390 18.724 17.931 21.148 12 12.03.08_ins_odo_03 120 21.220 26.370 23.962 13 12.03.08_ins_odo_03 150 21.225 24.376 20.194 14 12.03.08_ins_odo_03 180 22.298 24.119 24.606 15 12.03.08_ins_odo_03 210 8.413 14.464 15.219 16 12.03.08_ins_odo_03 240 18.117 19.111 19.870 17 12.03.08_ins_odo_0760 24.063 25.779 24.800 18 12.03.08_ins_odo_0790 19.790 23.276 18.678 19 12.03.08_ins_odo_07 120 15.617 23.707 19.545 20 12.03.08_ins_odo_07 150 24.818 22.373 24.515 21 12.03.08_ins_odo_07 180 16.301 19.976 25.309 22 12.03.08_ins_odo_07 210 23.843 24.772 26.025 23 12.03.08_ins_odo_07 240 9.029 15.125 20.139 24 12.03.08_ins_odo_07 270 6.533 22.833 23.618 here is my code so far... random -for(i in unique(file)){h[sample(1:24,1),]} random but this only gives me one sample... and if I try to exclude naming it as random, then nothing comes up. i'm confused and very new to R. please help! many many thanks! kristiina -- View this message in context: http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270396.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sampling one random frame from each unique trial?
Hi, take the following example and proceed accordingly. Name=c(Miller,Miller,Miller,Miller,Smith,Smith,Smith,Smith) X=rnorm(8) Year=rep(2000:2003,2) d=data.frame(Name,X,Year) #Row indices rows=1:dim(d)[1] #Which Name occupies which rows? #Name would be your file w=function(x){which(Name%in%unique(x))} samplefrom=tapply(Name,Name,w) #Sample one row index for each Name and #give the data frame d for these row indices f=function(x){sample(x,1)} d[unlist(lapply(samplefrom,f)),] HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
On 06/17/2010 03:27 AM, Somnath Somnath wrote: Thanks for all those reply. Is there any general rule to determine how many samples I would get from a population of size n, I draw a sample of size m (m may be greater than n) if sample is drawn with replacement? Hi Somnath, If you mean how many unique values, I think this is the occupancy problem that is discussed in: Feller, W. (1950) An introduction to probability theory and its applications (Vol 1). New York: Wiley. and probably other places. You can calculate the probability of obtaining each possible number of outcomes using the Maxwell-Botlzmann distribution. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
sample(1:20,4,replace=TRUE) should do it. Jun On Wed, Jun 16, 2010 at 9:20 AM, Somnath Somnath somnath700...@gmail.comwrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
Try sample(20, 4, replace = TRUE) HTH, Jorge On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
If you for some reason want to be shown all the possible combinations, try expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling. hth Rafael 2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com Try sample(20, 4, replace = TRUE) HTH, Jorge On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
On Jun 16, 2010, at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Already answered on the list. Is there any R function which will show me all such possible samples? ?expand.grid nrow(expand.grid(1:20, 1:20, 1:20, 1:20)) [1] 16 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
Hi Rafael, You might try: r - expand.grid(rep(list(1:20), 4)) dim(r) [1] 16 4 HTH, Jorge 2010/6/16 Rafael Björk If you for some reason want to be shown all the possible combinations, try expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling. hth Rafael 2010/6/16 Jorge Ivan Velez Try sample(20, 4, replace = TRUE) HTH, Jorge On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
How about library(TeachingSampling) SupportWR(20,4) Tom -- View this message in context: http://r.789695.n4.nabble.com/Sampling-with-replacement-tp2257450p2257644.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
Thanks for all those reply. Is there any general rule to determine how many samples I would get from a population of size n, I draw a sample of size m (m may be greater than n) if sample is drawn with replacement? Thanks, 2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com Hi Rafael, You might try: r - expand.grid(rep(list(1:20), 4)) dim(r) [1] 16 4 HTH, Jorge 2010/6/16 Rafael Björk If you for some reason want to be shown all the possible combinations, try expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling. hth Rafael 2010/6/16 Jorge Ivan Velez Try sample(20, 4, replace = TRUE) HTH, Jorge On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Somnath Somnath Sent: Wednesday, June 16, 2010 10:28 AM To: r-help@r-project.org Subject: Re: [R] Sampling with replacement Thanks for all those reply. Is there any general rule to determine how many samples I would get from a population of size n, I draw a sample of size m (m may be greater than n) if sample is drawn with replacement? If you consider two samples equivalent if they differ only in their ordering (e.g., c(1,2,2) is equivalent to c(2,1,2) and c(2,2,1)) then the answer is choose(n+m-1, m) If order matters then it is n^m Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thanks, 2010/6/16 Jorge Ivan Velez jorgeivanve...@gmail.com Hi Rafael, You might try: r - expand.grid(rep(list(1:20), 4)) dim(r) [1] 16 4 HTH, Jorge 2010/6/16 Rafael Björk If you for some reason want to be shown all the possible combinations, try expand.grid(1:20,1:20,1:20,1:20) (ugly code). Don't use this for sampling. hth Rafael 2010/6/16 Jorge Ivan Velez Try sample(20, 4, replace = TRUE) HTH, Jorge On Wed, Jun 16, 2010 at 10:20 AM, Somnath Somnath wrote: Dear all, good morning, I have a population, let say members are tagged with some simple number like 1,2,3,...20. I want to draw a sample with replacement of size 4 (say, can be more than 20 also). Is there any R function which will show me all such possible samples? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling with replacement
Somnath Somnath somnath700...@gmail.com [Wed, Jun 16, 2010 at 07:27:32PM CEST]: Thanks for all those reply. Is there any general rule to determine how many samples I would get from a population of size n, I draw a sample of size m (m may be greater than n) if sample is drawn with replacement? m^n -- Johannes Hüsing There is something fascinating about science. One gets such wholesale returns of conjecture mailto:johan...@huesing.name from such a trifling investment of fact. http://derwisch.wikidot.com (Mark Twain, Life on the Mississippi) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from Bivariate Uniform Distribution
The correlation will not be exactly 0, but will represent a draw from an independent population. There may be something in the copulas package to allow for more independence (but that about exhausts my knowledge of that package). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Haneef_An Sent: Monday, February 15, 2010 11:53 AM To: r-help@r-project.org Subject: Re: [R] Sampling from Bivariate Uniform Distribution When I wrap those values in to a matrix will it be still independent ? ( non zero correlation). Can I do this for any multivariate distribution which has the univariate form? Thank you for the response. Haneef -- View this message in context: http://n4.nabble.com/Sampling-from- Bivariate-Uniform-Distribution-tp1476485p1556481.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from Bivariate Uniform Distribution
When I wrap those values in to a matrix will it be still independent ? ( non zero correlation). Can I do this for any multivariate distribution which has the univariate form? Thank you for the response. Haneef -- View this message in context: http://n4.nabble.com/Sampling-from-Bivariate-Uniform-Distribution-tp1476485p1556481.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from Bivariate Uniform Distribution
The runif function generates random numbers from a uniform distribution, wrap those values into a matrix and you have a multi dimensional uniform distribution. If you want more than this, give us more detail. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Haneef Anver Sent: Wednesday, February 10, 2010 1:29 PM To: r-help@r-project.org Subject: [R] Sampling from Bivariate Uniform Distribution Hello all!!! 1) I am wondering is there a way to generate random numbers in R for Bivariate Uniform distribution? 2) Does R have built-in function for generating random numbers for any given bivariate distribution. Any help would be greatly appreciated !! Good day! Haneef Anver [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling theory
On Tue, 19 Jan 2010, Christian Hennig wrote: are there any R-packages for computations required in sampling theury (such as confidence intervals under random, stratified, cluster sampling; I'd be partoculary interested in confidence intervals for the population variance, which is difficult enough to find even in books)? Yes, these are in the survey package, for fairly general designs, using linearization or replicate weights. I don't know how good the confidence intervals for the variance are. One of the disadvantages of implementing survey estimators in a general way is that you lose the opportunity to use bias corrections that are only available for simple cases. The forthcoming version 3.19 (later this week) has nicer output for the population variance, but the computations are still the same. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from a Postgres database
One way could be to first select only the unique ID's, sample this and then select only the relevant records: strQuery = SELECT ID from tblFoo; IDs - sqlQuery(channel, strQuery) sample.IDs - sample(IDs,10) strQuery = paste(SELECT ID from tblFoo WHRE ID IN(, sample.IDs, );) IDs - sqlQuery(channel, strQuery) Bart christiaan pauw-2 wrote: Hi Everybody Is there a way in which one can use the RPostgreSQL package to take a sample from a table in Postgres database without having to read the whole table into R regards Christiaan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://n4.nabble.com/Sampling-from-a-Postgres-database-tp1014506p1014638.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling from a Postgres database
On 01/15/2010 01:49 AM, Bart Joosen wrote: One way could be to first select only the unique ID's, sample this and then select only the relevant records: strQuery = SELECT ID from tblFoo; IDs - sqlQuery(channel, strQuery) sample.IDs - sample(IDs,10) strQuery = paste(SELECT ID from tblFoo WHRE ID IN(, sample.IDs, );) IDs - sqlQuery(channel, strQuery) Better is to use the built-in random() function in Postgres: #select count(*) from visits; count - 4846604 (1 row) # select count(*) from visits where random() 0.005; count --- 24391 (1 row) HTH, Joe signature.asc Description: OpenPGP digital signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling dataframe
Here are some options that may help you out. First, let's put the data in a format that can be cut-and-pasted into R. myData - read.table(textConnection(var1 var2 var3 1 111 2 312 3 813 4 614 51015 6 221 7 422 8 623 9 824 10 1025),header=TRUE,row.names=1) closeAllConnections() or use dput myData - structure(list(var1 = c(1L, 3L, 8L, 6L, 10L, 2L, 4L, 6L, 8L, 10L), var2 = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), var3 = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), .Names = c(var1, var2, var3), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) #Select data where v2=1 select_v2 - myData[myData$var2==1,] # sample two rows of select_v2 sampled_v2 - select_v2[sample(1:nrow(select_v2),2),] # select rows of var3 not equal to 1 select_v3 - myData[myData$v3 !=1,] # ?rbind may also come in useful. 2009/11/25 Ronaldo Reis Júnior chrys...@gmail.com: Hi, I have a table like that: datatest var1 var2 var3 1 1 1 1 2 3 1 2 3 8 1 3 4 6 1 4 5 10 1 5 6 2 2 1 7 4 2 2 8 6 2 3 9 8 2 4 10 10 2 5 I need to create another table based on that with the rules: take a random sample by var2==1 (2 sample rows for example): var1 var2 var3 1 1 1 1 4 6 1 4 in this random sample a get the 1 and 4 value on the var3, now I need to complete the table with var1==2 with the lines that var3 are not select on var2==1 The resulting table is: var1 var2 var3 1 1 1 1 4 6 1 4 7 4 2 2 8 6 2 3 10 10 2 5 the value 1 and 4 on var3 is not present in the var2==2. I try several options but without success. take a random value is easy, but I cant select the others value excluding the random selected values. Any help? Thanks Ronaldo -- 17ª lei - Seu orientador quer que você se torne famoso, de modo que ele possa, finalmente, se tornar famoso. --Herman, I. P. 2007. Following the law. NATURE, Vol 445, p. 228. -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/DBG/Lab. Ecologia Comportamental e Computacional | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8192 | ronaldo.r...@unimontes.br | chrys...@gmail.com | http://www.ppgcb.unimontes.br/lecc | ICQ#: 5692561 | LinuxUser#: 205366 -- Favor NÃO ENVIAR arquivos do Word ou Powerpoint Prefira enviar em PDF, Texto, OpenOffice (ODF), HTML, or RTF. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling procedure
On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote: I would like to divide a vector in 9 groups in a way that each number is present in only one group. In a vector of 783 I would like to divide in 9 different groups of 87 Example - matrix(c(1:783),ncol = 1) Example - matrix(c(1:783),ncol = 1) Grp1 - sample(Example, 87, replace=FALSE) Grp2 - sample(Example[-Grp1], 87, replace=FALSE) Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE) # lather, rinse , repeat s1 - as.matrix(sample(Example,87, re = FALSE)) Example - Example[-s1] s2 - as.matrix(sample(Example,87, re = FALSE)) #however I don´t know how to remove the second group from the Example to continue sampling. #Don't mess up the original There is probably an easy and faster way to do this. Could anybody help me? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling procedure
If I understand what is wanted correctly, this can be a one-liner! -- think whole objects: splitup - function(x,n.groups) #split x into n.groups mutually exclusive sets { lx - length(x) if(n.groups = lx) stop(Number of groups greater than vector length) x - x[sample(lx,lx)] split(x,seq_len(n.groups)) } ## testit splitup(1:71,9) $`1` [1] 22 26 38 50 65 60 9 27 $`2` [1] 24 2 69 28 71 31 41 13 $`3` [1] 16 47 63 45 23 1 8 32 $`4` [1] 34 39 64 35 7 19 4 55 $`5` [1] 54 10 37 68 6 17 70 18 $`6` [1] 61 11 5 46 33 43 14 56 $`7` [1] 42 44 12 62 66 48 57 58 $`8` [1] 21 40 30 29 20 49 52 67 $`9` [1] 59 15 25 51 3 36 53 Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Thursday, October 15, 2009 7:55 AM To: Marcio Resende Cc: r-help@r-project.org Subject: Re: [R] Sampling procedure On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote: I would like to divide a vector in 9 groups in a way that each number is present in only one group. In a vector of 783 I would like to divide in 9 different groups of 87 Example - matrix(c(1:783),ncol = 1) Example - matrix(c(1:783),ncol = 1) Grp1 - sample(Example, 87, replace=FALSE) Grp2 - sample(Example[-Grp1], 87, replace=FALSE) Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE) # lather, rinse , repeat s1 - as.matrix(sample(Example,87, re = FALSE)) Example - Example[-s1] s2 - as.matrix(sample(Example,87, re = FALSE)) #however I don´t know how to remove the second group from the Example to continue sampling. #Don't mess up the original There is probably an easy and faster way to do this. Could anybody help me? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling procedure
If parsimony is needed, then define a 9-row matrix and send a randomized indexed version of Example to it: s-matrix(NA, nrow=9, ncol=length(Example)/9) s[,] - Example[sample(Example, length(Example) )] str(s) int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ... Or even: s-matrix(Example[ sample(Example, length(Example) )], nrow=9, ncol=length(Example)/9) -- David On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote: If I understand what is wanted correctly, this can be a one-liner! -- think whole objects: splitup - function(x,n.groups) #split x into n.groups mutually exclusive sets { lx - length(x) if(n.groups = lx) stop(Number of groups greater than vector length) x - x[sample(lx,lx)] split(x,seq_len(n.groups)) } ## testit splitup(1:71,9) $`1` [1] 22 26 38 50 65 60 9 27 $`2` [1] 24 2 69 28 71 31 41 13 $`3` [1] 16 47 63 45 23 1 8 32 $`4` [1] 34 39 64 35 7 19 4 55 $`5` [1] 54 10 37 68 6 17 70 18 $`6` [1] 61 11 5 46 33 43 14 56 $`7` [1] 42 44 12 62 66 48 57 58 $`8` [1] 21 40 30 29 20 49 52 67 $`9` [1] 59 15 25 51 3 36 53 Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David Winsemius Sent: Thursday, October 15, 2009 7:55 AM To: Marcio Resende Cc: r-help@r-project.org Subject: Re: [R] Sampling procedure On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote: I would like to divide a vector in 9 groups in a way that each number is present in only one group. In a vector of 783 I would like to divide in 9 different groups of 87 Example - matrix(c(1:783),ncol = 1) Example - matrix(c(1:783),ncol = 1) Grp1 - sample(Example, 87, replace=FALSE) Grp2 - sample(Example[-Grp1], 87, replace=FALSE) Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE) # lather, rinse , repeat s1 - as.matrix(sample(Example,87, re = FALSE)) Example - Example[-s1] s2 - as.matrix(sample(Example,87, re = FALSE)) #however I don´t know how to remove the second group from the Example to continue sampling. #Don't mess up the original There is probably an easy and faster way to do this. Could anybody help me? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling procedure
... except the matrix approach doesn't work if the length of the vector is not exactly divisible by the number of groups. That's why I used split. Cheers, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, October 15, 2009 8:48 AM To: Bert Gunter Cc: 'Marcio Resende'; r-help@r-project.org Subject: Re: [R] Sampling procedure If parsimony is needed, then define a 9-row matrix and send a randomized indexed version of Example to it: s-matrix(NA, nrow=9, ncol=length(Example)/9) s[,] - Example[sample(Example, length(Example) )] str(s) int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ... Or even: s-matrix(Example[ sample(Example, length(Example) )], nrow=9, ncol=length(Example)/9) -- David On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote: If I understand what is wanted correctly, this can be a one-liner! -- think whole objects: splitup - function(x,n.groups) #split x into n.groups mutually exclusive sets { lx - length(x) if(n.groups = lx) stop(Number of groups greater than vector length) x - x[sample(lx,lx)] split(x,seq_len(n.groups)) } ## testit splitup(1:71,9) $`1` [1] 22 26 38 50 65 60 9 27 $`2` [1] 24 2 69 28 71 31 41 13 $`3` [1] 16 47 63 45 23 1 8 32 $`4` [1] 34 39 64 35 7 19 4 55 $`5` [1] 54 10 37 68 6 17 70 18 $`6` [1] 61 11 5 46 33 43 14 56 $`7` [1] 42 44 12 62 66 48 57 58 $`8` [1] 21 40 30 29 20 49 52 67 $`9` [1] 59 15 25 51 3 36 53 Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David Winsemius Sent: Thursday, October 15, 2009 7:55 AM To: Marcio Resende Cc: r-help@r-project.org Subject: Re: [R] Sampling procedure On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote: I would like to divide a vector in 9 groups in a way that each number is present in only one group. In a vector of 783 I would like to divide in 9 different groups of 87 Example - matrix(c(1:783),ncol = 1) Example - matrix(c(1:783),ncol = 1) Grp1 - sample(Example, 87, replace=FALSE) Grp2 - sample(Example[-Grp1], 87, replace=FALSE) Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE) # lather, rinse , repeat s1 - as.matrix(sample(Example,87, re = FALSE)) Example - Example[-s1] s2 - as.matrix(sample(Example,87, re = FALSE)) #however I don´t know how to remove the second group from the Example to continue sampling. #Don't mess up the original There is probably an easy and faster way to do this. Could anybody help me? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling procedure
OK, you're right. I thought it might be simple fix to increase the number of columns to accommodate, but the recycling conventions trips up that strategy. Thanks; David. On Oct 15, 2009, at 11:55 AM, Bert Gunter wrote: ... except the matrix approach doesn't work if the length of the vector is not exactly divisible by the number of groups. That's why I used split. Cheers, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: Thursday, October 15, 2009 8:48 AM To: Bert Gunter Cc: 'Marcio Resende'; r-help@r-project.org Subject: Re: [R] Sampling procedure If parsimony is needed, then define a 9-row matrix and send a randomized indexed version of Example to it: s-matrix(NA, nrow=9, ncol=length(Example)/9) s[,] - Example[sample(Example, length(Example) )] str(s) int [1:9, 1:87] 503 731 708 23 255 675 163 381 361 412 ... Or even: s-matrix(Example[ sample(Example, length(Example) )], nrow=9, ncol=length(Example)/9) -- David On Oct 15, 2009, at 11:22 AM, Bert Gunter wrote: If I understand what is wanted correctly, this can be a one-liner! -- think whole objects: splitup - function(x,n.groups) #split x into n.groups mutually exclusive sets { lx - length(x) if(n.groups = lx) stop(Number of groups greater than vector length) x - x[sample(lx,lx)] split(x,seq_len(n.groups)) } ## testit splitup(1:71,9) $`1` [1] 22 26 38 50 65 60 9 27 $`2` [1] 24 2 69 28 71 31 41 13 $`3` [1] 16 47 63 45 23 1 8 32 $`4` [1] 34 39 64 35 7 19 4 55 $`5` [1] 54 10 37 68 6 17 70 18 $`6` [1] 61 11 5 46 33 43 14 56 $`7` [1] 42 44 12 62 66 48 57 58 $`8` [1] 21 40 30 29 20 49 52 67 $`9` [1] 59 15 25 51 3 36 53 Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of David Winsemius Sent: Thursday, October 15, 2009 7:55 AM To: Marcio Resende Cc: r-help@r-project.org Subject: Re: [R] Sampling procedure On Oct 15, 2009, at 10:19 AM, Marcio Resende wrote: I would like to divide a vector in 9 groups in a way that each number is present in only one group. In a vector of 783 I would like to divide in 9 different groups of 87 Example - matrix(c(1:783),ncol = 1) Example - matrix(c(1:783),ncol = 1) Grp1 - sample(Example, 87, replace=FALSE) Grp2 - sample(Example[-Grp1], 87, replace=FALSE) Grp3 - sample(Example[-c(Grp1, Grp2)], 87, replace=FALSE) # lather, rinse , repeat s1 - as.matrix(sample(Example,87, re = FALSE)) Example - Example[-s1] s2 - as.matrix(sample(Example,87, re = FALSE)) #however I don´t know how to remove the second group from the Example to continue sampling. #Don't mess up the original There is probably an easy and faster way to do this. Could anybody help me? Thanks -- David Winsemius, MD Heritage Laboratories West Hartford, CT David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling of non-overlapping intervals of variable length
On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote: Hi, I hope I am not repeating a question which has been posed already. I am trying to do the following in the most efficient way: I would like to sample from a finite (large) set of integers n non- overlapping intervals, where each interval i has a different, set length L_i (which is the number of integers in the interval). I had the idea to sample recursively on a vector with the already chosen intervals discarded but that seems to be too complicated. It might be ridiculously easy if you sampled on an index of a group of intervals. Why not pose the question in the form of example data.frames or other classes of R objects? Specification of the desired output would be essential. I think further specification of the sampling strategy would also help because I am unable to understand what sort of probability model you are hoping to apply. Any suggestions on that? Thanks a lot. Hadassa -- Hadassa Brunschwig PhD Student Department of Statistics David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling of non-overlapping intervals of variable length
Hi I am not sure what you mean by sampling an index of a group of intervals. I will try to give an example: Let's assume I have a vector 1:100. Let's say I have 10 intervals of different but known length, say, c(4,6,11,2,8,14,7,2,18,32). For simulation purposes I have to sample those 10 intervals 1000 times. The requirement is, however, that they should be of those lengths and should not be overlapping. In short, I would like to obtain a 10x1000 matrix with sampled intervals. Thanks Hadassa On Sun, Jul 19, 2009 at 9:48 PM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 19, 2009, at 1:05 PM, Hadassa Brunschwig wrote: Hi, I hope I am not repeating a question which has been posed already. I am trying to do the following in the most efficient way: I would like to sample from a finite (large) set of integers n non-overlapping intervals, where each interval i has a different, set length L_i (which is the number of integers in the interval). I had the idea to sample recursively on a vector with the already chosen intervals discarded but that seems to be too complicated. It might be ridiculously easy if you sampled on an index of a group of intervals. Why not pose the question in the form of example data.frames or other classes of R objects? Specification of the desired output would be essential. I think further specification of the sampling strategy would also help because I am unable to understand what sort of probability model you are hoping to apply. Any suggestions on that? Thanks a lot. Hadassa -- Hadassa Brunschwig PhD Student Department of Statistics David Winsemius, MD Heritage Laboratories West Hartford, CT -- Hadassa Brunschwig PhD Student Department of Statistics The Hebrew University of Jerusalem http://www.stat.huji.ac.il __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.