[R] Randomly drop a percent of data from a data.frame
Hi, I have the following data. set.seed(6245) data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) round(data,digits=3) x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly drop a percent of data from a data.frame
Hi, May be this helps: #data1 (changed `data` to `data1`) set.seed(6245) data1 - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data1- round(data1,digits=3) data2- data1 data1[,3:4]-lapply(data1[,3:4],function(x){x1- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4]);x[is.na(x1)]-NA;x}) data1 # x1 x2 x3 x4 #1 0.482 1.320 NA -0.142 #2 -0.753 -0.041 -0.063 0.886 #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 #or data2[,3:4]-lapply(data2[,3:4],function(x){x1- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4]);x[is.na(x1)]-NA;x}) data2 # x1 x2 x3 x4 #1 0.482 1.320 -0.859 -0.142 #2 -0.753 -0.041 NA NA #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 A.K. - Original Message - From: Christopher Desjardins cddesjard...@gmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, August 16, 2013 3:02 PM Subject: [R] Randomly drop a percent of data from a data.frame Hi, I have the following data. set.seed(6245) data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) round(data,digits=3) x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly drop a percent of data from a data.frame
Hi, Suppose the dataset had odd number of columns: set.seed(6458) data2- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5)) n- prod(dim(data2)) n #[1] 15 dummy- rep(F,n/2) dummy[sample(1:(n/2),n*.2)]-T dummy #[1] TRUE FALSE TRUE FALSE FALSE FALSE TRUE data2[,c(x2, x3)][matrix(dummy, nc = 2)] - NA #Error in `[-.data.frame`(`*tmp*`, matrix(dummy, nc = 2), value = NA) : # unsupported matrix index in replacement #In addition: Warning message: #In matrix(dummy, nc = 2) : # data length [7] is not a sub-multiple or multiple of the number of rows [4] I might do: n1- 2*nrow(data2) ##for 2 columns dummy- rep(FALSE,n1) dummy[sample(1:n1,n1*.2)]-TRUE data2[,c(x2,x3)][matrix(dummy,nc=2)]-NA data2 # x1 x2 x3 #1 -0.55899744 0.6622481 -0.3305958 #2 0.12776368 NA NA #3 -1.09734838 0.2069539 -0.6997853 #4 0.75919499 -0.5683809 0.4752002 #5 -0.03063141 -0.7549605 2.6038635 A.K. From: Richard Kwock richardkw...@gmail.com To: arun smartpink...@yahoo.com Cc: Christopher Desjardins cddesjard...@gmail.com; R help r-help@r-project.org Sent: Friday, August 16, 2013 5:55 PM Subject: Re: [R] Randomly drop a percent of data from a data.frame Try this: data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data - round(data,digits=3) #get the total counts n = prod(dim(data)) #set up a dummy array/matrix dummy - rep(F, n/2) dummy[sample(1:(n/2), n*.2)] - T # 5x2 dummy matrix with T and F matrix(dummy, nc = 2) #subset the T indices in x3 and x4 and replace with NAs data[,c(x3, x4)][matrix(dummy, nc = 2)] - NA data # x1 x2 x3 x4 #1 -1.310 0.659 NA 0.510 #2 -3.003 -0.004 NA NA #3 0.584 0.310 NA -0.087 #4 1.644 -2.792 -0.390 -0.382 #5 -1.791 0.840 1.137 0.820 Richard On Fri, Aug 16, 2013 at 2:34 PM, arun smartpink...@yahoo.com wrote: Hi, May be this helps: #data1 (changed `data` to `data1`) set.seed(6245) data1 - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data1- round(data1,digits=3) data2- data1 data1[,3:4]-lapply(data1[,3:4],function(x){x1- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4]);x[is.na(x1)]-NA;x}) data1 # x1 x2 x3 x4 #1 0.482 1.320 NA -0.142 #2 -0.753 -0.041 -0.063 0.886 #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 #or data2[,3:4]-lapply(data2[,3:4],function(x){x1- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4]);x[is.na(x1)]-NA;x}) data2 # x1 x2 x3 x4 #1 0.482 1.320 -0.859 -0.142 #2 -0.753 -0.041 NA NA #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 A.K. - Original Message - From: Christopher Desjardins cddesjard...@gmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, August 16, 2013 3:02 PM Subject: [R] Randomly drop a percent of data from a data.frame Hi, I have the following data. set.seed(6245) data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) round(data,digits=3) x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] Randomly drop a percent of data from a data.frame
Hi, Thanks for the help. What I actually ended up doing was writing a copy of for loops and I ended up getting something works. Thanks. Chris On Fri, Aug 16, 2013 at 4:34 PM, arun smartpink...@yahoo.com wrote: Hi, May be this helps: #data1 (changed `data` to `data1`) set.seed(6245) data1 - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data1- round(data1,digits=3) data2- data1 data1[,3:4]-lapply(data1[,3:4],function(x){x1- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4]);x[ is.na(x1)]-NA;x}) data1 # x1 x2 x3 x4 #1 0.482 1.320 NA -0.142 #2 -0.753 -0.041 -0.063 0.886 #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 #or data2[,3:4]-lapply(data2[,3:4],function(x){x1- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4]);x[ is.na(x1)]-NA;x}) data2 # x1 x2 x3 x4 #1 0.482 1.320 -0.859 -0.142 #2 -0.753 -0.041 NA NA #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 A.K. - Original Message - From: Christopher Desjardins cddesjard...@gmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, August 16, 2013 3:02 PM Subject: [R] Randomly drop a percent of data from a data.frame Hi, I have the following data. set.seed(6245) data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) round(data,digits=3) x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly drop a percent of data from a data.frame
Try this: data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data - round(data,digits=3) #get the total counts n = prod(dim(data)) #set up a dummy array/matrix dummy - rep(F, n/2) dummy[sample(1:(n/2), n*.2)] - T # 5x2 dummy matrix with T and F matrix(dummy, nc = 2) #subset the T indices in x3 and x4 and replace with NAs data[,c(x3, x4)][matrix(dummy, nc = 2)] - NA data # x1 x2 x3 x4 #1 -1.310 0.659 NA 0.510 #2 -3.003 -0.004 NA NA #3 0.584 0.310 NA -0.087 #4 1.644 -2.792 -0.390 -0.382 #5 -1.791 0.840 1.137 0.820 Richard On Fri, Aug 16, 2013 at 2:34 PM, arun smartpink...@yahoo.com wrote: Hi, May be this helps: #data1 (changed `data` to `data1`) set.seed(6245) data1 - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) data1- round(data1,digits=3) data2- data1 data1[,3:4]-lapply(data1[,3:4],function(x){x1- match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4]);x[ is.na(x1)]-NA;x}) data1 # x1 x2 x3 x4 #1 0.482 1.320 NA -0.142 #2 -0.753 -0.041 -0.063 0.886 #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 #or data2[,3:4]-lapply(data2[,3:4],function(x){x1- match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4]);x[ is.na(x1)]-NA;x}) data2 # x1 x2 x3 x4 #1 0.482 1.320 -0.859 -0.142 #2 -0.753 -0.041 NA NA #3 0.028 -0.256 -0.069 0.354 #4 -0.086 0.475 0.244 0.781 #5 0.690 -0.181 1.274 1.633 A.K. - Original Message - From: Christopher Desjardins cddesjard...@gmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Friday, August 16, 2013 3:02 PM Subject: [R] Randomly drop a percent of data from a data.frame Hi, I have the following data. set.seed(6245) data - data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5)) round(data,digits=3) x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 -0.069 0.354 4 -0.086 0.475 0.244 0.781 5 0.690 -0.181 1.274 1.633 What I would like to do is drop 20% of the data. But I want this 20% to only come from dropping data from x3 and x4. It doesn't have to be evenly, i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one observation has missing data on only one variable. I just want to drop 20% of the data through x3 and x4 only. In other words, x1 x2 x3 x4 1 0.482 1.320 -0.859 NA 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA 0.354 4 -0.086 0.475 NA 0.781 5 0.690 -0.181 NA 1.633 OR x1 x2 x3 x4 1 0.482 1.320 NA -0.142 2 -0.753 -0.041 -0.063 0.886 3 0.028 -0.256 NA NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 1.633 OR x1 x2 x3 x4 1 0.482 1.320 -0.859 -0.142 2 -0.753 -0.041 -0.063 NA 3 0.028 -0.256 -0.069 NA 4 -0.086 0.475 0.244 NA 5 0.690 -0.181 1.274 NA ETC. are all fine. Any ideas how I can do this? Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.