Hi Ted,

can you estimate how big the performance/scalability gain of using random projection would be compared to using the standard approach of iterative matrix-vector multiplication?

I already have working prototype code for the latter one, where I extend the approach to do a multiplication between the adjacency matrix and a matrix of steadystate probability vectors so several random walks can be done in parallel.

If the gain would be huge I'd try to dig into random projection though.

--sebastian

On 25.07.2011 18:06, Ted Dunning wrote:
This is the same as solving this system:

    (cM - I) r_k = (c-1) e_k

To do that, you can use any decomposition technique to compute an
approximate functional inverse of (cM - I).

SVD works so you can use the stochastic SVD algorithm that Dimitriy did.
  You probably can short-cut that substantially, however.

       A = (cM - I)

       Y = A \Omega
       Q_1 R_1 = Y
       B = Q_1^T A

Normally, you compute the SVD decomposition of B here, but I think you
might be able to solve your problem directly at this point.  Of course,
the SVD decomposition of B is very cheap if it fits into memory.

On Mon, Jul 25, 2011 at 8:34 AM, Sebastian Schelter <[email protected]
<mailto:[email protected]>> wrote:

    I don't know as I'm not familiar with the technique of Random
    Projection to be honest. I know of a paper doing the walk in a
    lower-dimensional space:
    http://www.cs.cmu.edu/~htong/__pdf/ICDM06_tong.pdf
    <http://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf>

    I plan to base the implementation on the formula from the Pegasus
    paper, where one searches for the vector r_k that satisfies the
    following equation:

    r_k = cMr_k + (1-c)e_k

    with c being the probability not to teleport away from a reached
    node, M being the transposed, row-normalized adjacency matrix
    (similar to the one used in PageRank) and e_k being a vector with
    the same dimension as the number of vertices with the kth element
    being 1 and all others 0.

    This formulation maps kinda nice to Map/Reduce as we simply need to
    create the adjacency matrix and then just need to iteratively
    multiply it by the steadystate probability vector (similar to
    PageRank). So we could include this algorithm without having to
    write a lot of custom code because we can express a lot of it
    through our already existing linear algebra code.

    --sebastian


    On 25.07.2011 17 <tel:25.07.2011%2017>:17, Ted Dunning wrote:

        Can this be done with a random projection?

        On Mon, Jul 25, 2011 at 4:59 AM, Sebastian Schelter (JIRA)
        <[email protected] <mailto:[email protected]>>wrote:


                 [
            
https://issues.apache.org/__jira/browse/MAHOUT-773?page=__com.atlassian.jira.plugin.__system.issuetabpanels:all-__tabpanel
            
<https://issues.apache.org/jira/browse/MAHOUT-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>]

            Sebastian Schelter updated MAHOUT-773:
            ------------------------------__--------

                Description:
            I'll create an implementation of Random Walk with Restarts
            as described in
            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
            Mining System -
            Implementation and Observations"
            http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf
            <http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>

            The algorithm is a random walk similar to PageRank with the
            difference that
            you start at and teleport to a certain node. The
            probabilities it computes
            can be seen as a measure of proximity between the start node
            and a reached
            node. To my knowledge RWR can be e.g used for link
            predicition in social
            networks.

            I will try to create an implementation that is able to do
            several walks in
            parallel and I will assume that a steadystate probability
            vector fits in
            memory.

            I don't plan to use the implementation details from the
            paper but I'll
            model the algorithm as an iterative multiplication between
            the adjacency
            matrix of the graph and the matrix created from the
            steadystate probability
            vectors for the vertices we compute the random walks for.

              was:
            I'll create an implementation of Random Walk with Restarts
            as described in
            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
            Mining System -
            Implementation and Observations"
            http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf
            <http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>

            The algorithm is a random walk similar to PageRank with the
            difference that
            you start at and teleport to a certain node. The
            probabilities it computes
            can be seen as a measure of proximity between the start node
            and a reached
            node. To my knowledge RWR can be e.g used for link
            predicition in social
            networks.

            I will try to create an implementation that is able to do
            several walks in
            parallel and I will assume that a steadystate probability
            vector fits in
            memory.


                Implement Random Walk with Restarts
                ------------------------------__-----

                                 Key: MAHOUT-773
                                 URL:
                https://issues.apache.org/__jira/browse/MAHOUT-773
                <https://issues.apache.org/jira/browse/MAHOUT-773>
                             Project: Mahout
                          Issue Type: New Feature
                          Components: Graph
                    Affects Versions: 0.6
                            Reporter: Sebastian Schelter
                            Assignee: Sebastian Schelter

                I'll create an implementation of Random Walk with
                Restarts as described

            in Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale
            Graph Mining System
            - Implementation and Observations"
            http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf
            <http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>

                The algorithm is a random walk similar to PageRank with
                the difference

            that you start at and teleport to a certain node. The
            probabilities it
            computes can be seen as a measure of proximity between the
            start node and a
            reached node. To my knowledge RWR can be e.g used for link
            predicition in
            social networks.

                I will try to create an implementation that is able to
                do several walks

            in parallel and I will assume that a steadystate probability
            vector fits in
            memory.

                I don't plan to use the implementation details from the
                paper but I'll

            model the algorithm as an iterative multiplication between
            the adjacency
            matrix of the graph and the matrix created from the
            steadystate probability
            vectors for the vertices we compute the random walks for.

            --
            This message is automatically generated by JIRA.
            For more information on JIRA, see:
            http://www.atlassian.com/__software/jira
            <http://www.atlassian.com/software/jira>







Reply via email to