Pagerank implementation in Map/Reduce
-------------------------------------

                 Key: MAHOUT-742
                 URL: https://issues.apache.org/jira/browse/MAHOUT-742
             Project: Mahout
          Issue Type: New Feature
          Components: Graph
    Affects Versions: 0.6
            Reporter: Christoph Nagel


Hi,

my name is Christoph Nagel. I'm student on technical university Berlin and 
participating on the course of Isabel Drost and Sebastian Schelter.
My work is to implement the pagerank-algorithm, where the pagerank-vector fits 
in memory.
For the computation I used the naive algorithm shown in the book 'Mining of 
Massive Datasets' from Rajaraman & Ullman 
(http://www-scf.usc.edu/~csci572/2012Spring/UllmanMiningMassiveDataSets.pdf).
Matrix- and vector-multiplication are done with mahout methods.

Most work is the transformation the input graph, which has to consists of a 
nodes- and edges file.
Format of nodes file: <node>\n
Format of edges file: <startNode>\t<endNode>\n

Therefore I created the following classes:
* LineIndexer: assigns each line an index
* EdgesToIndex: indexes the nodes of the edges
* EdgesIndexToTransitionMatrix: creates the transition matrix
* Pagerank: computes PR from transition matrix
* JoinNodesWithPagerank: creates the joined output
* PagerankExampleJob: does the complete job

Each class has a test (not PagerankExampleJob) and I took the example of the 
book for evaluating.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to