Re: [jira] [Updated] (MAHOUT-773) Implement Random Walk with Restarts

Ted Dunning Fri, 29 Jul 2011 01:49:16 -0700

It means that the random walk occasionally jumps to some random node without
respecting connections.


On Thu, Jul 28, 2011 at 10:55 PM, Lance Norskog <[email protected]> wrote:

> The word 'teleport' does not appear anywhere in the PEGASUS paper, or
> the RWR paper mentioned (reference 36). What does it mean?
>
> On Tue, Jul 26, 2011 at 12:56 AM, Ted Dunning <[email protected]>
> wrote:
> > It is hard to say, but if your skinny forms of the matrix will fit in
> > memory, then you will only need three or so map-reduce steps with no
> > iteration.  They should take about as long as your current iterations or
> a
> > little faster.
> >
> > For the matrices quoted in the paper you cited, I think I could even do
> the
> > computation in R.
> >
> > On Tue, Jul 26, 2011 at 12:08 AM, Sebastian Schelter <[email protected]>
> wrote:
> >
> >> Hi Ted,
> >>
> >> can you estimate how big the performance/scalability gain of using
> random
> >> projection would be compared to using the standard approach of iterative
> >> matrix-vector multiplication?
> >>
> >> I already have working prototype code for the latter one, where I extend
> >> the approach to do a multiplication between the adjacency matrix and a
> >> matrix of steadystate probability vectors so several random walks can be
> >> done in parallel.
> >>
> >> If the gain would be huge I'd try to dig into random projection though.
> >>
> >> --sebastian
> >>
> >>
> >> On 25.07.2011 18:06, Ted Dunning wrote:
> >>
> >>> This is the same as solving this system:
> >>>
> >>>    (cM - I) r_k = (c-1) e_k
> >>>
> >>> To do that, you can use any decomposition technique to compute an
> >>> approximate functional inverse of (cM - I).
> >>>
> >>> SVD works so you can use the stochastic SVD algorithm that Dimitriy
> did.
> >>>  You probably can short-cut that substantially, however.
> >>>
> >>>       A = (cM - I)
> >>>
> >>>       Y = A \Omega
> >>>       Q_1 R_1 = Y
> >>>       B = Q_1^T A
> >>>
> >>> Normally, you compute the SVD decomposition of B here, but I think you
> >>> might be able to solve your problem directly at this point.  Of course,
> >>> the SVD decomposition of B is very cheap if it fits into memory.
> >>>
> >>> On Mon, Jul 25, 2011 at 8:34 AM, Sebastian Schelter <[email protected]
> >>> <mailto:[email protected]>> wrote:
> >>>
> >>>    I don't know as I'm not familiar with the technique of Random
> >>>    Projection to be honest. I know of a paper doing the walk in a
> >>>    lower-dimensional space:
> >>>    http://www.cs.cmu.edu/~htong/_**_pdf/ICDM06_tong.pdf<
> http://www.cs.cmu.edu/~htong/__pdf/ICDM06_tong.pdf>
> >>>    <http://www.cs.cmu.edu/~htong/**pdf/ICDM06_tong.pdf<
> http://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf>
> >>> >
> >>>
> >>>    I plan to base the implementation on the formula from the Pegasus
> >>>    paper, where one searches for the vector r_k that satisfies the
> >>>    following equation:
> >>>
> >>>    r_k = cMr_k + (1-c)e_k
> >>>
> >>>    with c being the probability not to teleport away from a reached
> >>>    node, M being the transposed, row-normalized adjacency matrix
> >>>    (similar to the one used in PageRank) and e_k being a vector with
> >>>    the same dimension as the number of vertices with the kth element
> >>>    being 1 and all others 0.
> >>>
> >>>    This formulation maps kinda nice to Map/Reduce as we simply need to
> >>>    create the adjacency matrix and then just need to iteratively
> >>>    multiply it by the steadystate probability vector (similar to
> >>>    PageRank). So we could include this algorithm without having to
> >>>    write a lot of custom code because we can express a lot of it
> >>>    through our already existing linear algebra code.
> >>>
> >>>    --sebastian
> >>>
> >>>
> >>>    On 25.07.2011 17 <tel:25.07.2011%2017>:17, Ted Dunning wrote:
> >>>
> >>>        Can this be done with a random projection?
> >>>
> >>>        On Mon, Jul 25, 2011 at 4:59 AM, Sebastian Schelter (JIRA)
> >>>        <[email protected] <mailto:[email protected]>>**wrote:
> >>>
> >>>
> >>>
> >>>                 [
> >>>
> https://issues.apache.org/__**jira/browse/MAHOUT-773?page=__**
> >>> com.atlassian.jira.plugin.__**system.issuetabpanels:all-__**tabpanel<
> https://issues.apache.org/__jira/browse/MAHOUT-773?page=__com.atlassian.jira.plugin.__system.issuetabpanels:all-__tabpanel
> >
> >>>            <https://issues.apache.org/**jira/browse/MAHOUT-773?page=**
> >>> com.atlassian.jira.plugin.**system.issuetabpanels:all-**tabpanel<
> https://issues.apache.org/jira/browse/MAHOUT-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >
> >>> >]
> >>>
> >>>            Sebastian Schelter updated MAHOUT-773:
> >>>            ------------------------------**__--------
> >>>
> >>>                Description:
> >>>            I'll create an implementation of Random Walk with Restarts
> >>>            as described in
> >>>            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
> >>>            Mining System -
> >>>            Implementation and Observations"
> >>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
> >>> _pegasus.pdf<
> http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
> >>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
> >>> pegasus.pdf<
> http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
> >>> >
> >>>
> >>>            The algorithm is a random walk similar to PageRank with the
> >>>            difference that
> >>>            you start at and teleport to a certain node. The
> >>>            probabilities it computes
> >>>            can be seen as a measure of proximity between the start node
> >>>            and a reached
> >>>            node. To my knowledge RWR can be e.g used for link
> >>>            predicition in social
> >>>            networks.
> >>>
> >>>            I will try to create an implementation that is able to do
> >>>            several walks in
> >>>            parallel and I will assume that a steadystate probability
> >>>            vector fits in
> >>>            memory.
> >>>
> >>>            I don't plan to use the implementation details from the
> >>>            paper but I'll
> >>>            model the algorithm as an iterative multiplication between
> >>>            the adjacency
> >>>            matrix of the graph and the matrix created from the
> >>>            steadystate probability
> >>>            vectors for the vertices we compute the random walks for.
> >>>
> >>>              was:
> >>>            I'll create an implementation of Random Walk with Restarts
> >>>            as described in
> >>>            Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph
> >>>            Mining System -
> >>>            Implementation and Observations"
> >>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
> >>> _pegasus.pdf<
> http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
> >>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
> >>> pegasus.pdf<
> http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
> >>> >
> >>>
> >>>            The algorithm is a random walk similar to PageRank with the
> >>>            difference that
> >>>            you start at and teleport to a certain node. The
> >>>            probabilities it computes
> >>>            can be seen as a measure of proximity between the start node
> >>>            and a reached
> >>>            node. To my knowledge RWR can be e.g used for link
> >>>            predicition in social
> >>>            networks.
> >>>
> >>>            I will try to create an implementation that is able to do
> >>>            several walks in
> >>>            parallel and I will assume that a steadystate probability
> >>>            vector fits in
> >>>            memory.
> >>>
> >>>
> >>>                Implement Random Walk with Restarts
> >>>                ------------------------------**__-----
> >>>
> >>>                                 Key: MAHOUT-773
> >>>                                 URL:
> >>>                https://issues.apache.org/__**jira/browse/MAHOUT-773<
> https://issues.apache.org/__jira/browse/MAHOUT-773>
> >>>                <https://issues.apache.org/**jira/browse/MAHOUT-773<
> https://issues.apache.org/jira/browse/MAHOUT-773>
> >>> >
> >>>                             Project: Mahout
> >>>                          Issue Type: New Feature
> >>>                          Components: Graph
> >>>                    Affects Versions: 0.6
> >>>                            Reporter: Sebastian Schelter
> >>>                            Assignee: Sebastian Schelter
> >>>
> >>>                I'll create an implementation of Random Walk with
> >>>                Restarts as described
> >>>
> >>>            in Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale
> >>>            Graph Mining System
> >>>            - Implementation and Observations"
> >>>            http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_**
> >>> _pegasus.pdf<
> http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf>
> >>>            <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-**
> >>> pegasus.pdf<
> http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf>
> >>> >
> >>>
> >>>                The algorithm is a random walk similar to PageRank with
> >>>                the difference
> >>>
> >>>            that you start at and teleport to a certain node. The
> >>>            probabilities it
> >>>            computes can be seen as a measure of proximity between the
> >>>            start node and a
> >>>            reached node. To my knowledge RWR can be e.g used for link
> >>>            predicition in
> >>>            social networks.
> >>>
> >>>                I will try to create an implementation that is able to
> >>>                do several walks
> >>>
> >>>            in parallel and I will assume that a steadystate probability
> >>>            vector fits in
> >>>            memory.
> >>>
> >>>                I don't plan to use the implementation details from the
> >>>                paper but I'll
> >>>
> >>>            model the algorithm as an iterative multiplication between
> >>>            the adjacency
> >>>            matrix of the graph and the matrix created from the
> >>>            steadystate probability
> >>>            vectors for the vertices we compute the random walks for.
> >>>
> >>>            --
> >>>            This message is automatically generated by JIRA.
> >>>            For more information on JIRA, see:
> >>>            http://www.atlassian.com/__**software/jira<
> http://www.atlassian.com/__software/jira>
> >>>            <http://www.atlassian.com/**software/jira<
> http://www.atlassian.com/software/jira>
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Re: [jira] [Updated] (MAHOUT-773) Implement Random Walk with Restarts

Reply via email to