Re: [R] any way to make it work faster (deleting rows that contain certain values)
Chuck, thank you, but I am not sure I understood what you meant. There are a lot of rows in "index" where at least 2 columns have equal values and a lot of rows where column 1 has 2 and some other column has 5 - same for 3 in column 1 and 6 in some other column, etc. Thanks a lot for clarifying! Dimitri On Tue, Sep 22, 2009 at 5:36 PM, Charles C. Berry wrote: > On Tue, 22 Sep 2009, Dimitri Liakhovitski wrote: > >> Hello, dear R'ers, >> >> index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) >> >> In this case, dim(index) is 7,340,032 (!) and 11. >> I realize it's huge. >> Then, I am trying to get rid of the undesired combinations of columns. >> They should not contain identical values in any 2 columns. > > > Right, but you have only four values in each of columns 2:11. > > And none of them can be identical. > > There are exactly > > choose(4,10) > > rows that satisfy that constraint for columns 2:11. > > The rows of your result are easily enumerated by hand. ;-) > > HTH, > > Chuck > >> Also if column 1 has a value of 5, there should be no 2 in any other >> column, >> if column 1 has a value of 6, there should be no 3 in any other column, >> and >> column 1 has a value of 7, there should be no 4 in any other column. >> I worte a generic script to achieve that (below). >> However, I was wondering if it's possible to make it any faster - it >> looks like with that huge index it's going to take me days... >> >> Thanks a lot for any suggestion! >> Dimitri >> >> index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) >> bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T) >> for(i in 1:ncol(index)){ # looping through columns of the >> "index" >> for(pair in 1:nrow(bad.pairs)){ # looping through rows of "bad.pairs" >> keep<-sapply(1:nrow(index), function(x){ >> temp<-(index[[x,i]]==bad.pairs[pair,1]) & >> (any(index[x,-i]==bad.pairs[pair,2])) >> return(temp) >> }) >> index<-index[!keep,] >> } >> } >> >> -- >> Dimitri Liakhovitski >> Ninah.com >> dimitri.liakhovit...@ninah.com >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cbe...@tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any way to make it work faster (deleting rows that contain certain values)
On Tue, 22 Sep 2009, Dimitri Liakhovitski wrote: Hello, dear R'ers, index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) In this case, dim(index) is 7,340,032 (!) and 11. I realize it's huge. Then, I am trying to get rid of the undesired combinations of columns. They should not contain identical values in any 2 columns. Right, but you have only four values in each of columns 2:11. And none of them can be identical. There are exactly choose(4,10) rows that satisfy that constraint for columns 2:11. The rows of your result are easily enumerated by hand. ;-) HTH, Chuck Also if column 1 has a value of 5, there should be no 2 in any other column, if column 1 has a value of 6, there should be no 3 in any other column, and column 1 has a value of 7, there should be no 4 in any other column. I worte a generic script to achieve that (below). However, I was wondering if it's possible to make it any faster - it looks like with that huge index it's going to take me days... Thanks a lot for any suggestion! Dimitri index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T) for(i in 1:ncol(index)){# looping through columns of the "index" for(pair in 1:nrow(bad.pairs)){ # looping through rows of "bad.pairs" keep<-sapply(1:nrow(index), function(x){ temp<-(index[[x,i]]==bad.pairs[pair,1]) & (any(index[x,-i]==bad.pairs[pair,2])) return(temp) }) index<-index[!keep,] } } -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] any way to make it work faster (deleting rows that contain certain values)
Hello, dear R'ers, index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) In this case, dim(index) is 7,340,032 (!) and 11. I realize it's huge. Then, I am trying to get rid of the undesired combinations of columns. They should not contain identical values in any 2 columns. Also if column 1 has a value of 5, there should be no 2 in any other column, if column 1 has a value of 6, there should be no 3 in any other column, and column 1 has a value of 7, there should be no 4 in any other column. I worte a generic script to achieve that (below). However, I was wondering if it's possible to make it any faster - it looks like with that huge index it's going to take me days... Thanks a lot for any suggestion! Dimitri index<-expand.grid(1:7,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4,1:4) bad.pairs<-matrix(c(1,1,2,2,3,3,4,4,5,2,6,3,7,4),nrow=7,ncol=2,byrow=T) for(i in 1:ncol(index)){# looping through columns of the "index" for(pair in 1:nrow(bad.pairs)){ # looping through rows of "bad.pairs" keep<-sapply(1:nrow(index), function(x){ temp<-(index[[x,i]]==bad.pairs[pair,1]) & (any(index[x,-i]==bad.pairs[pair,2])) return(temp) }) index<-index[!keep,] } } -- Dimitri Liakhovitski Ninah.com dimitri.liakhovit...@ninah.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.