2011/6/28 Claudio Martella <[email protected]> > > > So the question is: if i want to run Pagerank over my graph, do I need > to be able to store the whole graph in memory? > > > Hi Claudio,
In my opinion you don't need to store the whole graph in memory. For example, you can store in the HDFS Page Rank records in the following way: vertex | adjacency_list You can split this file into chunks (every BSP node will process exactly one chunk). Every Page Rank record could be process separately (there is no need to read whole chunk into memory). If you every time read only one PR record and process it it will look like Map Reduce algorithm, but you can read more PR records (as many as you can - it depends on the memory constraints). Best, Pawel
