Re: Distributed cache Design

Colin Evans Thu, 16 Oct 2008 14:52:51 -0700

At Freebase, we're mapping our large graphs into very large files oftriples in HDFS and running large queries over them.Hadoop is optimized for processing streaming data off of disk, and we'vefound that trying to load a multi-GB graph and then access it in aHadoop task has scaling problems. Mapping the graph to an on-diskrepresentation as a bunch of interlocking or overlapping subgraphs worksvery well.


-Colin



Bhupesh Bansal wrote:

Hey guys,


We at Linkedin are trying to run some Large Graph Analysis problems on
Hadoop. The fastest way to run would be to keep a copy of whole Graph in RAM
at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores
machine with 8G on each.

Whats is the best way of doing that ?? Is there a way so that multiple
mappers on same machine can access a RAM cache ??  I read about hadoop
distributed cache looks like it's copies the file (hdfs / http) locally on
the slaves but not necessrily in RAM ??

Best
Bhupesh

Re: Distributed cache Design

Reply via email to