The word 'teleport' does not appear anywhere in the PEGASUS paper, or the RWR paper mentioned (reference 36). What does it mean?
On Tue, Jul 26, 2011 at 12:56 AM, Ted Dunning <[email protected]> wrote: > It is hard to say, but if your skinny forms of the matrix will fit in > memory, then you will only need three or so map-reduce steps with no > iteration. They should take about as long as your current iterations or a > little faster. > > For the matrices quoted in the paper you cited, I think I could even do the > computation in R. > > On Tue, Jul 26, 2011 at 12:08 AM, Sebastian Schelter <[email protected]> wrote: > >> Hi Ted, >> >> can you estimate how big the performance/scalability gain of using random >> projection would be compared to using the standard approach of iterative >> matrix-vector multiplication? >> >> I already have working prototype code for the latter one, where I extend >> the approach to do a multiplication between the adjacency matrix and a >> matrix of steadystate probability vectors so several random walks can be >> done in parallel. >> >> If the gain would be huge I'd try to dig into random projection though. >> >> --sebastian >> >> >> On 25.07.2011 18:06, Ted Dunning wrote: >> >>> This is the same as solving this system: >>> >>> (cM - I) r_k = (c-1) e_k >>> >>> To do that, you can use any decomposition technique to compute an >>> approximate functional inverse of (cM - I). >>> >>> SVD works so you can use the stochastic SVD algorithm that Dimitriy did. >>> You probably can short-cut that substantially, however. >>> >>> A = (cM - I) >>> >>> Y = A \Omega >>> Q_1 R_1 = Y >>> B = Q_1^T A >>> >>> Normally, you compute the SVD decomposition of B here, but I think you >>> might be able to solve your problem directly at this point. Of course, >>> the SVD decomposition of B is very cheap if it fits into memory. >>> >>> On Mon, Jul 25, 2011 at 8:34 AM, Sebastian Schelter <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I don't know as I'm not familiar with the technique of Random >>> Projection to be honest. I know of a paper doing the walk in a >>> lower-dimensional space: >>> >>> http://www.cs.cmu.edu/~htong/_**_pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/__pdf/ICDM06_tong.pdf> >>> >>> <http://www.cs.cmu.edu/~htong/**pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf> >>> > >>> >>> I plan to base the implementation on the formula from the Pegasus >>> paper, where one searches for the vector r_k that satisfies the >>> following equation: >>> >>> r_k = cMr_k + (1-c)e_k >>> >>> with c being the probability not to teleport away from a reached >>> node, M being the transposed, row-normalized adjacency matrix >>> (similar to the one used in PageRank) and e_k being a vector with >>> the same dimension as the number of vertices with the kth element >>> being 1 and all others 0. >>> >>> This formulation maps kinda nice to Map/Reduce as we simply need to >>> create the adjacency matrix and then just need to iteratively >>> multiply it by the steadystate probability vector (similar to >>> PageRank). So we could include this algorithm without having to >>> write a lot of custom code because we can express a lot of it >>> through our already existing linear algebra code. >>> >>> --sebastian >>> >>> >>> On 25.07.2011 17 <tel:25.07.2011%2017>:17, Ted Dunning wrote: >>> >>> Can this be done with a random projection? >>> >>> On Mon, Jul 25, 2011 at 4:59 AM, Sebastian Schelter (JIRA) >>> <[email protected] <mailto:[email protected]>>**wrote: >>> >>> >>> >>> [ >>> https://issues.apache.org/__**jira/browse/MAHOUT-773?page=__** >>> com.atlassian.jira.plugin.__**system.issuetabpanels:all-__**tabpanel<https://issues.apache.org/__jira/browse/MAHOUT-773?page=__com.atlassian.jira.plugin.__system.issuetabpanels:all-__tabpanel> >>> <https://issues.apache.org/**jira/browse/MAHOUT-773?page=** >>> com.atlassian.jira.plugin.**system.issuetabpanels:all-**tabpanel<https://issues.apache.org/jira/browse/MAHOUT-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel> >>> >] >>> >>> Sebastian Schelter updated MAHOUT-773: >>> ------------------------------**__-------- >>> >>> Description: >>> I'll create an implementation of Random Walk with Restarts >>> as described in >>> Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph >>> Mining System - >>> Implementation and Observations" >>> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >>> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >>> > >>> >>> The algorithm is a random walk similar to PageRank with the >>> difference that >>> you start at and teleport to a certain node. The >>> probabilities it computes >>> can be seen as a measure of proximity between the start node >>> and a reached >>> node. To my knowledge RWR can be e.g used for link >>> predicition in social >>> networks. >>> >>> I will try to create an implementation that is able to do >>> several walks in >>> parallel and I will assume that a steadystate probability >>> vector fits in >>> memory. >>> >>> I don't plan to use the implementation details from the >>> paper but I'll >>> model the algorithm as an iterative multiplication between >>> the adjacency >>> matrix of the graph and the matrix created from the >>> steadystate probability >>> vectors for the vertices we compute the random walks for. >>> >>> was: >>> I'll create an implementation of Random Walk with Restarts >>> as described in >>> Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph >>> Mining System - >>> Implementation and Observations" >>> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >>> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >>> > >>> >>> The algorithm is a random walk similar to PageRank with the >>> difference that >>> you start at and teleport to a certain node. The >>> probabilities it computes >>> can be seen as a measure of proximity between the start node >>> and a reached >>> node. To my knowledge RWR can be e.g used for link >>> predicition in social >>> networks. >>> >>> I will try to create an implementation that is able to do >>> several walks in >>> parallel and I will assume that a steadystate probability >>> vector fits in >>> memory. >>> >>> >>> Implement Random Walk with Restarts >>> ------------------------------**__----- >>> >>> Key: MAHOUT-773 >>> URL: >>> >>> https://issues.apache.org/__**jira/browse/MAHOUT-773<https://issues.apache.org/__jira/browse/MAHOUT-773> >>> >>> <https://issues.apache.org/**jira/browse/MAHOUT-773<https://issues.apache.org/jira/browse/MAHOUT-773> >>> > >>> Project: Mahout >>> Issue Type: New Feature >>> Components: Graph >>> Affects Versions: 0.6 >>> Reporter: Sebastian Schelter >>> Assignee: Sebastian Schelter >>> >>> I'll create an implementation of Random Walk with >>> Restarts as described >>> >>> in Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale >>> Graph Mining System >>> - Implementation and Observations" >>> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >>> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >>> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >>> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >>> > >>> >>> The algorithm is a random walk similar to PageRank with >>> the difference >>> >>> that you start at and teleport to a certain node. The >>> probabilities it >>> computes can be seen as a measure of proximity between the >>> start node and a >>> reached node. To my knowledge RWR can be e.g used for link >>> predicition in >>> social networks. >>> >>> I will try to create an implementation that is able to >>> do several walks >>> >>> in parallel and I will assume that a steadystate probability >>> vector fits in >>> memory. >>> >>> I don't plan to use the implementation details from the >>> paper but I'll >>> >>> model the algorithm as an iterative multiplication between >>> the adjacency >>> matrix of the graph and the matrix created from the >>> steadystate probability >>> vectors for the vertices we compute the random walks for. >>> >>> -- >>> This message is automatically generated by JIRA. >>> For more information on JIRA, see: >>> >>> http://www.atlassian.com/__**software/jira<http://www.atlassian.com/__software/jira> >>> >>> <http://www.atlassian.com/**software/jira<http://www.atlassian.com/software/jira> >>> > >>> >>> >>> >>> >>> >>> >>> >> > -- Lance Norskog [email protected]
