Re: Breadth-first search

2012-12-11 Thread Avery Ching
We are running several Giraph applications in production using our 
version of Hadoop (Corona) at Facebook.  The part you have to be careful 
about is ensuring you have enough resources for your job to run.  But 
otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many 
more edges).


Avery

On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:

Hi:

I implemented a graph algorithm to recommend content to our users. 
Although it is working (implementation uses Mahout) it very 
inefficient because I have to run many iterations in order to perform 
a breadth-first search on my graph.
I would like to use Giraph for that task. I would like to know if it 
is production ready. I'm running jobs on Amazon EMR.


Thanks in advance.
Gustavo




Re: Breadth-first search

2012-12-11 Thread Gustavo Enrique Salazar Torres
Hi Avery:

Regarding resources I guess I won't need that much, our graph has 60,000
nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if
needed.

Thank you very much.
Gustavo

On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching ach...@apache.org wrote:

 We are running several Giraph applications in production using our version
 of Hadoop (Corona) at Facebook.  The part you have to be careful about is
 ensuring you have enough resources for your job to run.  But otherwise, we
 are able to run at FB-scale (i.e. 1billion+ nodes, many more edges).

 Avery


 On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:

 Hi:

 I implemented a graph algorithm to recommend content to our users.
 Although it is working (implementation uses Mahout) it very inefficient
 because I have to run many iterations in order to perform a breadth-first
 search on my graph.
 I would like to use Giraph for that task. I would like to know if it is
 production ready. I'm running jobs on Amazon EMR.

 Thanks in advance.
 Gustavo





Problems running PageRank benchmark

2012-12-11 Thread Gustavo Enrique Salazar Torres
Hi:

I checked out release 0.1 and ,after compiling it, I tried to run this line:

hadoop jar giraph-0.1-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5000 -w 2

but job took too long to finish:

Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex
12/12/11 18:15:49 WARN bsp.BspOutputFormat: checkOutputSpecs:
ImmutableOutputCommiter will not check anything
12/12/11 18:15:49 INFO mapred.JobClient: Running job: job_201212111806_0003
12/12/11 18:15:50 INFO mapred.JobClient:  map 0% reduce 0%
12/12/11 18:16:06 INFO mapred.JobClient:  map 33% reduce 0%
12/12/11 18:21:11 INFO mapred.JobClient: Job complete: job_201212111806_0003
12/12/11 18:21:11 INFO mapred.JobClient: Counters: 5
12/12/11 18:21:11 INFO mapred.JobClient:   Job Counters
12/12/11 18:21:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=619750
12/12/11 18:21:11 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
12/12/11 18:21:11 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/12/11 18:21:11 INFO mapred.JobClient: Launched map tasks=2
12/12/11 18:21:11 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=4181

I checked logs for one of the map tasks and found this:

2012-12-11 18:21:03,929 INFO org.apache.giraph.graph.BspServiceMaster:
checkWorkers: No response from partition 2 (could be master)
2012-12-11 18:21:03,929 ERROR org.apache.giraph.graph.BspServiceMaster:
checkWorkers: Did not receive enough processes in time (only 1 of 2
required).  This occurs if you do not have enough map tasks available
simultaneously on your Hadoop instance to fulfill the number of requested
workers.
2012-12-11 18:21:03,932 INFO org.apache.giraph.graph.BspServiceMaster:
setJobState:
{_stateKey:FAILED,_applicationAttemptKey:-1,_superstepKey:-1} on
superstep -1
2012-12-11 18:21:04,018 FATAL org.apache.giraph.graph.BspServiceMaster:
failJob: Killing job job_201212111806_0003
2012-12-11 18:21:04,142 INFO org.apache.giraph.graph.BspServiceMaster:
cleanup: Notifying master its okay to cleanup with
/_hadoopBsp/job_201212111806_0003/_cleanedUpDir/0_master
2012-12-11 18:21:04,159 INFO org.apache.giraph.graph.BspServiceMaster:
cleanUpZooKeeper: Node /_hadoopBsp/job_201212111806_0003/_cleanedUpDir
already exists, no need to create.
2012-12-11 18:21:04,161 INFO org.apache.giraph.graph.BspServiceMaster:
cleanUpZooKeeper: Got 1 of 3 desired children from
/_hadoopBsp/job_201212111806_0003/_cleanedUpDir
2012-12-11 18:21:04,161 INFO org.apache.giraph.graph.BspServiceMaster:
cleanedUpZooKeeper: Waiting for the children of
/_hadoopBsp/job_201212111806_0003/_cleanedUpDir to change since only got 1
nodes.
2012-12-11 18:21:05,013 WARN org.apache.giraph.zk.ZooKeeperManager:
onlineZooKeeperServers: Forced a shutdown hook kill of the ZooKeeper
process.

My hadoop version is 1.0.3. Is there any special configuration to be done?
Can anybody help me?

Thanks
Gustavo


Re: Breadth-first search

2012-12-11 Thread Jan van der Lugt
Hi Gustavo,

If your graph fits in memory, you might be interested Green-Marl, a
language tailored for graph processing:
https://github.com/stanford-ppl/Green-Marl
You can compile your Green-Marl program to an extremely fast C++ program,
but also to Giraph program when your graph does not fit in memory anymore.

- Jan

On Tue, Dec 11, 2012 at 8:33 PM, Gustavo Enrique Salazar Torres 
gsala...@ime.usp.br wrote:

 Hi Avery:

 Regarding resources I guess I won't need that much, our graph has 60,000
 nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if
 needed.

 Thank you very much.
 Gustavo

 On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching ach...@apache.org wrote:

 We are running several Giraph applications in production using our
 version of Hadoop (Corona) at Facebook.  The part you have to be careful
 about is ensuring you have enough resources for your job to run.  But
 otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many more
 edges).

 Avery


 On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote:

 Hi:

 I implemented a graph algorithm to recommend content to our users.
 Although it is working (implementation uses Mahout) it very inefficient
 because I have to run many iterations in order to perform a breadth-first
 search on my graph.
 I would like to use Giraph for that task. I would like to know if it is
 production ready. I'm running jobs on Amazon EMR.

 Thanks in advance.
 Gustavo