Re: Distributed cache Design

Bhupesh Bansal Thu, 16 Oct 2008 15:14:38 -0700

Can you elaborate here ,

Lets say I want to implement a DFS in my graph. I am not able to picturise
implementing it with doing graph in pieces without putting a depth bound to
(3-4). Lets say we have 200M (4GB) edges to start with


Best
Bhupesh



On 10/16/08 3:01 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:

> 
> On Oct 16, 2008, at 1:52 PM, Bhupesh Bansal wrote:
> 
>> We at Linkedin are trying to run some Large Graph Analysis problems on
>> Hadoop. The fastest way to run would be to keep a copy of whole
>> Graph in RAM
>> at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-
>> cores
>> machine with 8G on each.
> 
> The best way to deal with it is *not* to load the entire graph in one
> process. In the WebMap at Yahoo, we have a graph of the web that has
> roughly 1 trillion links and 100 billion nodes. See http://tinyurl.com/4fgok6
>   . To invert the links, you process the graph in pieces and resort
> based on the target. You'll get much better performance and scale to
> almost any size.
> 
>> Whats is the best way of doing that ?? Is there a way so that multiple
>> mappers on same machine can access a RAM cache ??  I read about hadoop
>> distributed cache looks like it's copies the file (hdfs / http)
>> locally on
>> the slaves but not necessrily in RAM ??
> 
> You could mmap the file from distributed cache using MappedByteBuffer.
> Then there will be one copy between jvms...
> 
> -- Owen

Re: Distributed cache Design

Reply via email to