Hi, Charlie, Thank you for the reply. Maybe I don't need the frequency of each pair. I only need the top, say 50, pairs with the highest frequency. Is there anyway which can avoid calculating for all the pairs?
Thanks, Cindy On Sun, Nov 15, 2009 at 4:18 PM, cls59 <ch...@sharpsteen.net> wrote: > > > > cindy Guo wrote: > > > > Hi, All, > > > > I have an n by m matrix with each entry between 1 and 15000. I want to > > know > > the frequency of each pair in 1:15000 that occur together in rows. So for > > example, if the matrix is > > 2 5 1 6 > > 1 7 8 2 > > 3 7 6 2 > > 9 8 5 7 > > Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return > > the value 2 for this pair as well as that for all pairs. Is there a fast > > way > > to do this avoiding loops? Loops take too long. > > > > Thank you, > > > > Cindy > > > > Use %in% to check for the presence of the numbers in a row and apply() to > efficiently execute the test for each row: > > tstMatrix <- matrix( c(2,5,1,6, > 1,7,8,2, > 3,7,6,2, > 9,8,5,7), nrow=4, byrow=T ) > > matches <- apply( tstMatrix, 1, function( row ){ > > if( 2 %in% row & 6 %in% row ){ > > return( 2 ) > > } else { > > return( 0 ) > > } > > }) > > matches > [1] 2 0 2 0 > > If you have more than one pair, it gets a little tricky. Say you are also > looking for the pair (7,8). Store them as a list: > > pairList <- list( c(2,6), c(7,8) ) > > Then use sapply() to efficiently iterate over the pair list and execute the > apply() test: > > matchMatrix <- sapply( pairList, function( pair ){ > > matches <- apply( tstMatrix, 1, function( row ){ > > if( pair[1] %in% row & pair[2] %in% row ){ > > return( pair[1] ) > > } else { > > return( 0 ) > > } > > }) > > return( matches ) > > }) > > matchMatrix > > [,1] [,2] > [1,] 2 0 > [2,] 0 7 > [3,] 2 0 > [4,] 0 7 > > > > If you're looking to apply the above method to every possible permutation > of > 2 numbers that may be generated from the range of numbers 1:15000... that's > 225,000,000 pairs. expand.grid() can generate the required pair list-- but > that step alone causes a memory allocation of ~6 GB on my machine. > > If you don't have a pile of CPU cores and RAM at your disposal, you can > probably: > > 1. Restrict the upper end of your range to the maximal entry present in > your matrix since all other combinations have zero occurrences. > > 2. Break the list of pairs up into several sublists, run the tests, and > aggregate the results. > > Either way, the analysis will take some time despite the efficiencies of > the > apply family of functions due to the shear size of the problem. If you > have > more than one CPU, I would recommend taking a look at parallelized apply > functions, perhaps using a package like snowfall, as the testing of the > pairs is an "embarrassingly parallel" problem. > > Hopefully I'm misunderstanding the scope of your problem. > > > Good luck! > > -Charlie > > ----- > Charlie Sharpsteen > Undergraduate > Environmental Resources Engineering > Humboldt State University > -- > View this message in context: > http://old.nabble.com/pairs-tp26364801p26365206.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.