Hi, Charlie,

Thank you for the reply. Maybe I don't need the frequency of each pair. I
only need the top, say 50, pairs with the highest frequency. Is there anyway
which can avoid calculating for all the pairs?

Thanks,

Cindy
On Sun, Nov 15, 2009 at 4:18 PM, cls59 <ch...@sharpsteen.net> wrote:

>
>
>
> cindy Guo wrote:
> >
> > Hi, All,
> >
> > I have an n by m matrix with each entry between 1 and 15000. I want to
> > know
> > the frequency of each pair in 1:15000 that occur together in rows. So for
> > example, if the matrix is
> > 2 5 1 6
> > 1 7 8 2
> > 3 7 6 2
> > 9 8 5 7
> > Pair (2,6) (un-ordered) occurs together in rows 1 and 3. I want to return
> > the value 2 for this pair as well as that for all pairs. Is there a fast
> > way
> > to do this avoiding loops? Loops take too long.
> >
> > Thank you,
> >
> > Cindy
> >
>
> Use %in% to check for the presence of the numbers in a row and apply() to
> efficiently execute the test for each row:
>
>  tstMatrix <- matrix( c(2,5,1,6,
>    1,7,8,2,
>    3,7,6,2,
>    9,8,5,7), nrow=4, byrow=T )
>
>  matches <- apply( tstMatrix, 1, function( row ){
>
>    if( 2 %in% row & 6 %in% row ){
>
>      return( 2 )
>
>    } else {
>
>      return( 0 )
>
>    }
>
>  })
>
>  matches
>  [1] 2 0 2 0
>
> If you have more than one pair, it gets a little tricky.  Say you are also
> looking for the pair (7,8).  Store them as a list:
>
>  pairList <- list( c(2,6), c(7,8) )
>
> Then use sapply() to efficiently iterate over the pair list and execute the
> apply() test:
>
>  matchMatrix <- sapply( pairList, function( pair ){
>
>    matches <- apply( tstMatrix, 1, function( row ){
>
>      if( pair[1] %in% row & pair[2] %in% row ){
>
>        return( pair[1] )
>
>      } else {
>
>        return( 0 )
>
>      }
>
>    })
>
>    return( matches )
>
>  })
>
>  matchMatrix
>
>       [,1] [,2]
>  [1,]    2    0
>  [2,]    0    7
>  [3,]    2    0
>  [4,]    0    7
>
>
>
> If you're looking to apply the above method to every possible permutation
> of
> 2 numbers that may be generated from the range of numbers 1:15000... that's
> 225,000,000 pairs. expand.grid() can generate the required pair list-- but
> that step alone causes a memory allocation of ~6 GB on my machine.
>
> If you don't have a pile of CPU cores and RAM at your disposal, you can
> probably:
>
>  1. Restrict the upper end of your range to the maximal entry present in
> your matrix since all other combinations have zero occurrences.
>
>  2. Break the list of pairs up into several sublists, run the tests, and
> aggregate the results.
>
> Either way, the analysis will take some time despite the efficiencies of
> the
> apply family of functions due to the shear size of the problem.  If you
> have
> more than one CPU, I would recommend taking a look at parallelized apply
> functions, perhaps using a package like snowfall,  as the testing of the
> pairs is an "embarrassingly parallel" problem.
>
> Hopefully I'm misunderstanding the scope of your problem.
>
>
> Good luck!
>
> -Charlie
>
> -----
> Charlie Sharpsteen
> Undergraduate
> Environmental Resources Engineering
> Humboldt State University
> --
> View this message in context:
> http://old.nabble.com/pairs-tp26364801p26365206.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to