Can you elaborate here , Lets say I want to implement a DFS in my graph. I am not able to picturise implementing it with doing graph in pieces without putting a depth bound to (3-4). Lets say we have 200M (4GB) edges to start with
Best Bhupesh On 10/16/08 3:01 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: > > On Oct 16, 2008, at 1:52 PM, Bhupesh Bansal wrote: > >> We at Linkedin are trying to run some Large Graph Analysis problems on >> Hadoop. The fastest way to run would be to keep a copy of whole >> Graph in RAM >> at all mappers. (Graph size is about 8G in RAM) we have cluster of 8- >> cores >> machine with 8G on each. > > The best way to deal with it is *not* to load the entire graph in one > process. In the WebMap at Yahoo, we have a graph of the web that has > roughly 1 trillion links and 100 billion nodes. See http://tinyurl.com/4fgok6 > . To invert the links, you process the graph in pieces and resort > based on the target. You'll get much better performance and scale to > almost any size. > >> Whats is the best way of doing that ?? Is there a way so that multiple >> mappers on same machine can access a RAM cache ?? I read about hadoop >> distributed cache looks like it's copies the file (hdfs / http) >> locally on >> the slaves but not necessrily in RAM ?? > > You could mmap the file from distributed cache using MappedByteBuffer. > Then there will be one copy between jvms... > > -- Owen