Re: [R] Sample of a subsample
Yes. Beating a pretty weary horse, a slightly cleaner version of my prior offering using with(), instead of within() is: with(dat, dat[sampleNo[sample(var1[!var1%%2 & !sampleNo], 10, rep=FALSE)], "sampleNo"] <- 2) with() and within() are convenient ways to avoid having to repeatedly name the columns via $ . Note also the use of logical subscripting of the data frame in which numeric 0 is coerced to FALSE and any nonzero value to TRUE (which I should have done previously). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 25, 2017 at 11:43 AM, Eric Berger wrote: > Hi David, > I was about to post a reply when Bert responded. His answer is good > and his comment to use the name 'dat' rather than 'data' is instructive. > I am providing my suggestion as well because I think it may address > what was causing you some confusion (mainly to use "which", but also > the missing !) > > idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ), size=10, > replace=F) > data[idx2,]$sampleNo <- 2 > > Eric > > > > On Mon, Sep 25, 2017 at 9:03 PM, Bert Gunter > wrote: > >> For personal aesthetic reasons, I changed the name "data" to "dat". >> >> Your code, with a slight modification: >> >> set.seed (1357) ## for reproducibility >> dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) >> dat$sampleNo <- 0 >> idx <- sample(seq(1,nrow(dat)), size=10, replace=F) >> dat[idx,"sampleNo"] <-1 >> >> ## yielding >> > dat >> >>var1 var2 sampleNo >> 1 1 400 >> 2 2 391 >> 3 3 380 >> 4 4 370 >> 5 5 360 >> 6 6 351 >> 7 7 340 >> 8 8 330 >> 9 9 320 >> 10 10 310 >> 11 11 300 >> 12 12 290 >> 13 13 280 >> 14 14 270 >> 15 15 261 >> 16 16 251 >> 17 17 240 >> 18 18 230 >> 19 19 220 >> 20 20 211 >> 21 21 200 >> 22 22 191 >> 23 23 180 >> 24 24 171 >> 25 25 160 >> 26 26 151 >> 27 27 140 >> 28 28 130 >> 29 29 120 >> 30 30 110 >> 31 31 100 >> 32 3290 >> 33 3380 >> 34 3470 >> 35 3561 >> 36 3650 >> 37 3741 >> 38 3830 >> 39 3920 >> 40 4010 >> >> ## This is basically a transcription of your specification into indexing >> logic >> >> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & >> sampleNo==0],10,rep=FALSE)] <- 2) >> >> ##yielding >> > dat >> >>var1 var2 sampleNo >> 1 1 400 >> 2 2 391 >> 3 3 380 >> 4 4 372 >> 5 5 360 >> 6 6 351 >> 7 7 340 >> 8 8 332 >> 9 9 320 >> 10 10 312 >> 11 11 300 >> 12 12 290 >> 13 13 280 >> 14 14 272 >> 15 15 261 >> 16 16 251 >> 17 17 240 >> 18 18 232 >> 19 19 220 >> 20 20 211 >> 21 21 200 >> 22 22 191 >> 23 23 180 >> 24 24 171 >> 25 25 160 >> 26 26 151 >> 27 27 140 >> 28 28 132 >> 29 29 120 >> 30 30 112 >> 31 31 100 >> 32 3292 >> 33 3380 >> 34 3472 >> 35 3561 >> 36 3652 >> 37 3741 >> 38 3830 >> 39 3920 >> 40 4010 >> >> >> >> >> >> dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & >> sampleNo==0],10,rep=FALSE)] <- 2) >> >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and >> sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Mon, Sep 25, 2017 at 10:27 AM, David Studer >> wrote: >> >> > Hello everybody! >> > >> > I have the following problem: I'd like to select a sample from a >> subsample >> > in a dataset. Actually, I don't want to select it, but to create a new >> > variable sampleNo that indicates to which sample (one or two) a case >> > belongs to. >> > >> > Lets suppose I have a dataset containing 40 cases: >> > >> > data <- data.frame(var1=seq(1:40), var2=seq(40,1)) >> > >> > The first sample (n=10) I drew like this: >> > >> > data$sampleNo <- 0 >> > idx <- sample(seq(1,nrow(data)), size=10, replace=F) >> > data[idx,]$sampleNo <- 1 >> > >> > Now, (and here my problems start) I'd like to draw a second sample >> (n=10). >> > But this sample should be drawn from the cases that don't belong to the >> > first sample only. *Add
Re: [R] Sample of a subsample
Hi David, I was about to post a reply when Bert responded. His answer is good and his comment to use the name 'dat' rather than 'data' is instructive. I am providing my suggestion as well because I think it may address what was causing you some confusion (mainly to use "which", but also the missing !) idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ), size=10, replace=F) data[idx2,]$sampleNo <- 2 Eric On Mon, Sep 25, 2017 at 9:03 PM, Bert Gunter wrote: > For personal aesthetic reasons, I changed the name "data" to "dat". > > Your code, with a slight modification: > > set.seed (1357) ## for reproducibility > dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) > dat$sampleNo <- 0 > idx <- sample(seq(1,nrow(dat)), size=10, replace=F) > dat[idx,"sampleNo"] <-1 > > ## yielding > > dat > >var1 var2 sampleNo > 1 1 400 > 2 2 391 > 3 3 380 > 4 4 370 > 5 5 360 > 6 6 351 > 7 7 340 > 8 8 330 > 9 9 320 > 10 10 310 > 11 11 300 > 12 12 290 > 13 13 280 > 14 14 270 > 15 15 261 > 16 16 251 > 17 17 240 > 18 18 230 > 19 19 220 > 20 20 211 > 21 21 200 > 22 22 191 > 23 23 180 > 24 24 171 > 25 25 160 > 26 26 151 > 27 27 140 > 28 28 130 > 29 29 120 > 30 30 110 > 31 31 100 > 32 3290 > 33 3380 > 34 3470 > 35 3561 > 36 3650 > 37 3741 > 38 3830 > 39 3920 > 40 4010 > > ## This is basically a transcription of your specification into indexing > logic > > dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & > sampleNo==0],10,rep=FALSE)] <- 2) > > ##yielding > > dat > >var1 var2 sampleNo > 1 1 400 > 2 2 391 > 3 3 380 > 4 4 372 > 5 5 360 > 6 6 351 > 7 7 340 > 8 8 332 > 9 9 320 > 10 10 312 > 11 11 300 > 12 12 290 > 13 13 280 > 14 14 272 > 15 15 261 > 16 16 251 > 17 17 240 > 18 18 232 > 19 19 220 > 20 20 211 > 21 21 200 > 22 22 191 > 23 23 180 > 24 24 171 > 25 25 160 > 26 26 151 > 27 27 140 > 28 28 132 > 29 29 120 > 30 30 112 > 31 31 100 > 32 3292 > 33 3380 > 34 3472 > 35 3561 > 36 3652 > 37 3741 > 38 3830 > 39 3920 > 40 4010 > > > > > > dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & > sampleNo==0],10,rep=FALSE)] <- 2) > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Mon, Sep 25, 2017 at 10:27 AM, David Studer wrote: > > > Hello everybody! > > > > I have the following problem: I'd like to select a sample from a > subsample > > in a dataset. Actually, I don't want to select it, but to create a new > > variable sampleNo that indicates to which sample (one or two) a case > > belongs to. > > > > Lets suppose I have a dataset containing 40 cases: > > > > data <- data.frame(var1=seq(1:40), var2=seq(40,1)) > > > > The first sample (n=10) I drew like this: > > > > data$sampleNo <- 0 > > idx <- sample(seq(1,nrow(data)), size=10, replace=F) > > data[idx,]$sampleNo <- 1 > > > > Now, (and here my problems start) I'd like to draw a second sample > (n=10). > > But this sample should be drawn from the cases that don't belong to the > > first sample only. *Additionally, "var1" should be an even number.* > > > > So sampleNo should be 0 for cases that were not drawn at all, 1 for cases > > that belong to the first sample and 2 for cases belonging to the second > > sample (= sampleNo equals 0 and var1 is even). > > > > I was trying to solve it like this: > > > > idx2<-data$var1%%2 & data$sampleNo==0 > > sample(data[idx2,], size=10, replace=F) > > > > But how can I set sampleNo to 2? > > > > > > Thank you very much for your help! > > > > David > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML versi
Re: [R] Sample of a subsample
For personal aesthetic reasons, I changed the name "data" to "dat". Your code, with a slight modification: set.seed (1357) ## for reproducibility dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) dat$sampleNo <- 0 idx <- sample(seq(1,nrow(dat)), size=10, replace=F) dat[idx,"sampleNo"] <-1 ## yielding > dat var1 var2 sampleNo 1 1 400 2 2 391 3 3 380 4 4 370 5 5 360 6 6 351 7 7 340 8 8 330 9 9 320 10 10 310 11 11 300 12 12 290 13 13 280 14 14 270 15 15 261 16 16 251 17 17 240 18 18 230 19 19 220 20 20 211 21 21 200 22 22 191 23 23 180 24 24 171 25 25 160 26 26 151 27 27 140 28 28 130 29 29 120 30 30 110 31 31 100 32 3290 33 3380 34 3470 35 3561 36 3650 37 3741 38 3830 39 3920 40 4010 ## This is basically a transcription of your specification into indexing logic dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & sampleNo==0],10,rep=FALSE)] <- 2) ##yielding > dat var1 var2 sampleNo 1 1 400 2 2 391 3 3 380 4 4 372 5 5 360 6 6 351 7 7 340 8 8 332 9 9 320 10 10 312 11 11 300 12 12 290 13 13 280 14 14 272 15 15 261 16 16 251 17 17 240 18 18 232 19 19 220 20 20 211 21 21 200 22 22 191 23 23 180 24 24 171 25 25 160 26 26 151 27 27 140 28 28 132 29 29 120 30 30 112 31 31 100 32 3292 33 3380 34 3472 35 3561 36 3652 37 3741 38 3830 39 3920 40 4010 dat <- within(dat,sampleNo[sample(var1[(var1%%2 == 0) & sampleNo==0],10,rep=FALSE)] <- 2) Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 25, 2017 at 10:27 AM, David Studer wrote: > Hello everybody! > > I have the following problem: I'd like to select a sample from a subsample > in a dataset. Actually, I don't want to select it, but to create a new > variable sampleNo that indicates to which sample (one or two) a case > belongs to. > > Lets suppose I have a dataset containing 40 cases: > > data <- data.frame(var1=seq(1:40), var2=seq(40,1)) > > The first sample (n=10) I drew like this: > > data$sampleNo <- 0 > idx <- sample(seq(1,nrow(data)), size=10, replace=F) > data[idx,]$sampleNo <- 1 > > Now, (and here my problems start) I'd like to draw a second sample (n=10). > But this sample should be drawn from the cases that don't belong to the > first sample only. *Additionally, "var1" should be an even number.* > > So sampleNo should be 0 for cases that were not drawn at all, 1 for cases > that belong to the first sample and 2 for cases belonging to the second > sample (= sampleNo equals 0 and var1 is even). > > I was trying to solve it like this: > > idx2<-data$var1%%2 & data$sampleNo==0 > sample(data[idx2,], size=10, replace=F) > > But how can I set sampleNo to 2? > > > Thank you very much for your help! > > David > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sample of a subsample
Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like this: data$sampleNo <- 0 idx <- sample(seq(1,nrow(data)), size=10, replace=F) data[idx,]$sampleNo <- 1 Now, (and here my problems start) I'd like to draw a second sample (n=10). But this sample should be drawn from the cases that don't belong to the first sample only. *Additionally, "var1" should be an even number.* So sampleNo should be 0 for cases that were not drawn at all, 1 for cases that belong to the first sample and 2 for cases belonging to the second sample (= sampleNo equals 0 and var1 is even). I was trying to solve it like this: idx2<-data$var1%%2 & data$sampleNo==0 sample(data[idx2,], size=10, replace=F) But how can I set sampleNo to 2? Thank you very much for your help! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.