Hey guys,
We at Linkedin are trying to run some Large Graph Analysis problems on Hadoop. The fastest way to run would be to keep a copy of whole Graph in RAM at all mappers. (Graph size is about 8G in RAM) we have cluster of 8-cores machine with 8G on each. Whats is the best way of doing that ?? Is there a way so that multiple mappers on same machine can access a RAM cache ?? I read about hadoop distributed cache looks like it's copies the file (hdfs / http) locally on the slaves but not necessrily in RAM ?? Best Bhupesh