It is hard to say, but if your skinny forms of the matrix will fit in memory, then you will only need three or so map-reduce steps with no iteration. They should take about as long as your current iterations or a little faster.
For the matrices quoted in the paper you cited, I think I could even do the computation in R. On Tue, Jul 26, 2011 at 12:08 AM, Sebastian Schelter <[email protected]> wrote: > Hi Ted, > > can you estimate how big the performance/scalability gain of using random > projection would be compared to using the standard approach of iterative > matrix-vector multiplication? > > I already have working prototype code for the latter one, where I extend > the approach to do a multiplication between the adjacency matrix and a > matrix of steadystate probability vectors so several random walks can be > done in parallel. > > If the gain would be huge I'd try to dig into random projection though. > > --sebastian > > > On 25.07.2011 18:06, Ted Dunning wrote: > >> This is the same as solving this system: >> >> (cM - I) r_k = (c-1) e_k >> >> To do that, you can use any decomposition technique to compute an >> approximate functional inverse of (cM - I). >> >> SVD works so you can use the stochastic SVD algorithm that Dimitriy did. >> You probably can short-cut that substantially, however. >> >> A = (cM - I) >> >> Y = A \Omega >> Q_1 R_1 = Y >> B = Q_1^T A >> >> Normally, you compute the SVD decomposition of B here, but I think you >> might be able to solve your problem directly at this point. Of course, >> the SVD decomposition of B is very cheap if it fits into memory. >> >> On Mon, Jul 25, 2011 at 8:34 AM, Sebastian Schelter <[email protected] >> <mailto:[email protected]>> wrote: >> >> I don't know as I'm not familiar with the technique of Random >> Projection to be honest. I know of a paper doing the walk in a >> lower-dimensional space: >> >> http://www.cs.cmu.edu/~htong/_**_pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/__pdf/ICDM06_tong.pdf> >> >> <http://www.cs.cmu.edu/~htong/**pdf/ICDM06_tong.pdf<http://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf> >> > >> >> I plan to base the implementation on the formula from the Pegasus >> paper, where one searches for the vector r_k that satisfies the >> following equation: >> >> r_k = cMr_k + (1-c)e_k >> >> with c being the probability not to teleport away from a reached >> node, M being the transposed, row-normalized adjacency matrix >> (similar to the one used in PageRank) and e_k being a vector with >> the same dimension as the number of vertices with the kth element >> being 1 and all others 0. >> >> This formulation maps kinda nice to Map/Reduce as we simply need to >> create the adjacency matrix and then just need to iteratively >> multiply it by the steadystate probability vector (similar to >> PageRank). So we could include this algorithm without having to >> write a lot of custom code because we can express a lot of it >> through our already existing linear algebra code. >> >> --sebastian >> >> >> On 25.07.2011 17 <tel:25.07.2011%2017>:17, Ted Dunning wrote: >> >> Can this be done with a random projection? >> >> On Mon, Jul 25, 2011 at 4:59 AM, Sebastian Schelter (JIRA) >> <[email protected] <mailto:[email protected]>>**wrote: >> >> >> >> [ >> https://issues.apache.org/__**jira/browse/MAHOUT-773?page=__** >> com.atlassian.jira.plugin.__**system.issuetabpanels:all-__**tabpanel<https://issues.apache.org/__jira/browse/MAHOUT-773?page=__com.atlassian.jira.plugin.__system.issuetabpanels:all-__tabpanel> >> <https://issues.apache.org/**jira/browse/MAHOUT-773?page=** >> com.atlassian.jira.plugin.**system.issuetabpanels:all-**tabpanel<https://issues.apache.org/jira/browse/MAHOUT-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel> >> >] >> >> Sebastian Schelter updated MAHOUT-773: >> ------------------------------**__-------- >> >> Description: >> I'll create an implementation of Random Walk with Restarts >> as described in >> Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph >> Mining System - >> Implementation and Observations" >> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >> > >> >> The algorithm is a random walk similar to PageRank with the >> difference that >> you start at and teleport to a certain node. The >> probabilities it computes >> can be seen as a measure of proximity between the start node >> and a reached >> node. To my knowledge RWR can be e.g used for link >> predicition in social >> networks. >> >> I will try to create an implementation that is able to do >> several walks in >> parallel and I will assume that a steadystate probability >> vector fits in >> memory. >> >> I don't plan to use the implementation details from the >> paper but I'll >> model the algorithm as an iterative multiplication between >> the adjacency >> matrix of the graph and the matrix created from the >> steadystate probability >> vectors for the vertices we compute the random walks for. >> >> was: >> I'll create an implementation of Random Walk with Restarts >> as described in >> Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale Graph >> Mining System - >> Implementation and Observations" >> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >> > >> >> The algorithm is a random walk similar to PageRank with the >> difference that >> you start at and teleport to a certain node. The >> probabilities it computes >> can be seen as a measure of proximity between the start node >> and a reached >> node. To my knowledge RWR can be e.g used for link >> predicition in social >> networks. >> >> I will try to create an implementation that is able to do >> several walks in >> parallel and I will assume that a steadystate probability >> vector fits in >> memory. >> >> >> Implement Random Walk with Restarts >> ------------------------------**__----- >> >> Key: MAHOUT-773 >> URL: >> >> https://issues.apache.org/__**jira/browse/MAHOUT-773<https://issues.apache.org/__jira/browse/MAHOUT-773> >> >> <https://issues.apache.org/**jira/browse/MAHOUT-773<https://issues.apache.org/jira/browse/MAHOUT-773> >> > >> Project: Mahout >> Issue Type: New Feature >> Components: Graph >> Affects Versions: 0.6 >> Reporter: Sebastian Schelter >> Assignee: Sebastian Schelter >> >> I'll create an implementation of Random Walk with >> Restarts as described >> >> in Kang, Tsourakakis, Faloutsos, "PEGASUS: A Peta-Scale >> Graph Mining System >> - Implementation and Observations" >> http://www.cs.cmu.edu/~__**christos/PUBLICATIONS/icdm09-_** >> _pegasus.pdf<http://www.cs.cmu.edu/~__christos/PUBLICATIONS/icdm09-__pegasus.pdf> >> <http://www.cs.cmu.edu/~**christos/PUBLICATIONS/icdm09-** >> pegasus.pdf<http://www.cs.cmu.edu/~christos/PUBLICATIONS/icdm09-pegasus.pdf> >> > >> >> The algorithm is a random walk similar to PageRank with >> the difference >> >> that you start at and teleport to a certain node. The >> probabilities it >> computes can be seen as a measure of proximity between the >> start node and a >> reached node. To my knowledge RWR can be e.g used for link >> predicition in >> social networks. >> >> I will try to create an implementation that is able to >> do several walks >> >> in parallel and I will assume that a steadystate probability >> vector fits in >> memory. >> >> I don't plan to use the implementation details from the >> paper but I'll >> >> model the algorithm as an iterative multiplication between >> the adjacency >> matrix of the graph and the matrix created from the >> steadystate probability >> vectors for the vertices we compute the random walks for. >> >> -- >> This message is automatically generated by JIRA. >> For more information on JIRA, see: >> >> http://www.atlassian.com/__**software/jira<http://www.atlassian.com/__software/jira> >> >> <http://www.atlassian.com/**software/jira<http://www.atlassian.com/software/jira> >> > >> >> >> >> >> >> >> >
