Re: Breadth First Search (BFS)
Hi, I have managed to implement it correctly after re-thinking my approach. The video described it in an asynchronous algorithm which, as you pointed out is not a distributed. I followed the BFS algorithm written in this paper: http://www.cc.gatech.edu/~bader/papers/GraphBSPonXMT-MTAAP2013.pdf. and managed to get it working. Thanks for the help, Ghufran On Mon, Apr 7, 2014 at 11:00 AM, Lukas Nalezenec < lukas.naleze...@firma.seznam.cz> wrote: > Hi, > I dont know what problem do you exactly solve and I have never written > distributed BFS > but I think that sharing queue using agreggator wont scale for larger > graphs. > The algorithm in video is basic its not distributed version. > > Lukas > > > On 3.4.2014 15:13, ghufran malik wrote: > > Hi, > > Lukas, I believe the queue list is essential to the BFS algorithm I am > trying? I am following this explanation given here: > > https://www.youtube.com/watch?v=zLZhSSXAwxI > > So for my output I want to have the vertex id followed by the number > representing the order in which it was visited so vertex 5 was the start so > it has a value of 0, it has neighbours 4, 8 and 2, the order in which they > would be visited should be 2-4-8 so I want their values to be 1,2,3 > respectively. > > Nishant, Yes I've seen those implementations before they were done by a > student at my university some time ago, however for the BFSSO, at least, it > give the depth of each vertex and not result as presented in the above > video. Looking back at this algorithm though has brought into question how > I originally understood how the BSP model works in Giraph. I am confused on > how it works based on 1 fact, I thought every vertex was initially active > and run the compute method, so why isit that after superstep 0 in this > algorithm, every other vertex is not set to 1? > > I could be flawed in my understanding but It seems to me that this > algorithm for superstep 0 sets the depth (value) for the start vertex to 0, > that's fine, after that it sends a message to all vertices that are its > neighbours to activate them (according to the code comments)? I thought all > vertices where initially active until they vote to halt? > > Pregel Paper: > "In superstep 0, every vertex is in the active state; all > active vertices participate in the computation of any given > superstep. A vertex deactivates itself by voting to halt." > > He then sets every other vertex value that's not the start vertex to a max > integer. The supersteps after checks if the vertex value is of a max value > and if it is set the value to that superstep number. So superstep 1 all the > remaining vertices should be set to 1? - this assumption of mine I believe > is wrong but I need help in understanding why. > > this was my original understanding until I ran the code on the following > graph: > > [0,0,[[1,1],[3,3]]] > [1,0,[[0,1],[2,2],[3,1]]] > [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[5,0]]] > [4,0,[[2,4]]] > [5,0,[[3,0]]] > > and got the output: > > 5 2.0 > 0 1.0 > 2 1.0 > 1 0.0 > 3 1.0 > 4 2.0 > > Note that the start vertex id was 1. So 1 is 0 thats done if the first > superstep. then messages are sent to its neighbours and it seems to me that > only these are active in superstep 1 so they are given a value 1 and > messages are sent to their neighbours (4 and 5). Why are 4 and 5 not active > in superstep 1 as well and given a value 1? Only in superstep 2 once these > vertices have received messages are they taken in to consideration and give > a value 2. > > So in conclusion why aren't all the vertices given a value 1 in > superstep 1 as I thought all vertices should be active? Unless all vertices > are ONLY active in superstep 0 after which ones that do not receive a > message vote to a halt? > > Sorry for long read, > > Thanks, > > Ghufran > > > > On Thu, Apr 3, 2014 at 11:50 AM, nishant gandhi > wrote: > >> Check this out for BFS.. >> >> >> http://stackoverflow.com/questions/12253794/breadth-first-implentation-in-giraph-graphchi-or-pregel >> >> Nishant Gandhi >> M.Tech CSE >> IIT Patna >> >> >> On Thu, Apr 3, 2014 at 3:18 PM, Lukas Nalezenec < >> lukas.naleze...@firma.seznam.cz> wrote: >> >>> Hi, >>> It looks like you are using wrong algorithm. If you are doing simple BFS >>> you should not need to remember vertex ids. >>> >>> Lukas >>> >>> Lukas >>> >>> >>> On 2.4.2014 20:30, ghufran malik wrote: >>> >>> Hi >>> >>> I am trying to implement the BFS algorithm using Giraph 1.1.0. >>> >>> I have partly implemented it and am stuck on just one part. I have a >>> list of vertex ids and I want these ids to be seen in the next superstep. >>> Is it possible for me to just have the list in my BasicComputations class >>> or do I need to send it to an aggregator or master class of some sort to >>> store this list so that in the next superstep I can use this list of vertex >>> ids in my basiccomputations class? >>> >>> kind regards, >>> >>> Ghufran >>> >>> >>> >> > >
Re: Breadth First Search (BFS)
Hi, I dont know what problem do you exactly solve and I have never written distributed BFS but I think that sharing queue using agreggator wont scale for larger graphs. The algorithm in video is basic its not distributed version. Lukas On 3.4.2014 15:13, ghufran malik wrote: Hi, Lukas, I believe the queue list is essential to the BFS algorithm I am trying? I am following this explanation given here: https://www.youtube.com/watch?v=zLZhSSXAwxI So for my output I want to have the vertex id followed by the number representing the order in which it was visited so vertex 5 was the start so it has a value of 0, it has neighbours 4, 8 and 2, the order in which they would be visited should be 2-4-8 so I want their values to be 1,2,3 respectively. Nishant, Yes I've seen those implementations before they were done by a student at my university some time ago, however for the BFSSO, at least, it give the depth of each vertex and not result as presented in the above video. Looking back at this algorithm though has brought into question how I originally understood how the BSP model works in Giraph. I am confused on how it works based on 1 fact, I thought every vertex was initially active and run the compute method, so why isit that after superstep 0 in this algorithm, every other vertex is not set to 1? I could be flawed in my understanding but It seems to me that this algorithm for superstep 0 sets the depth (value) for the start vertex to 0, that's fine, after that it sends a message to all vertices that are its neighbours to activate them (according to the code comments)? I thought all vertices where initially active until they vote to halt? Pregel Paper: "In superstep 0, every vertex is in the active state; all active vertices participate in the computation of any given superstep. A vertex deactivates itself by voting to halt." He then sets every other vertex value that's not the start vertex to a max integer. The supersteps after checks if the vertex value is of a max value and if it is set the value to that superstep number. So superstep 1 all the remaining vertices should be set to 1? - this assumption of mine I believe is wrong but I need help in understanding why. this was my original understanding until I ran the code on the following graph: [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] [3,0,[[0,3],[1,1],[5,0]]] [4,0,[[2,4]]] [5,0,[[3,0]]] and got the output: 52.0 01.0 21.0 10.0 31.0 42.0 Note that the start vertex id was 1. So 1 is 0 thats done if the first superstep. then messages are sent to its neighbours and it seems to me that only these are active in superstep 1 so they are given a value 1 and messages are sent to their neighbours (4 and 5). Why are 4 and 5 not active in superstep 1 as well and given a value 1? Only in superstep 2 once these vertices have received messages are they taken in to consideration and give a value 2. So in conclusion why aren't all the vertices given a value 1 in superstep 1 as I thought all vertices should be active? Unless all vertices are ONLY active in superstep 0 after which ones that do not receive a message vote to a halt? Sorry for long read, Thanks, Ghufran On Thu, Apr 3, 2014 at 11:50 AM, nishant gandhi mailto:nishantgandh...@gmail.com>> wrote: Check this out for BFS.. http://stackoverflow.com/questions/12253794/breadth-first-implentation-in-giraph-graphchi-or-pregel Nishant Gandhi M.Tech CSE IIT Patna On Thu, Apr 3, 2014 at 3:18 PM, Lukas Nalezenec mailto:lukas.naleze...@firma.seznam.cz>> wrote: Hi, It looks like you are using wrong algorithm. If you are doing simple BFS you should not need to remember vertex ids. Lukas Lukas On 2.4.2014 20:30, ghufran malik wrote: Hi I am trying to implement the BFS algorithm using Giraph 1.1.0. I have partly implemented it and am stuck on just one part. I have a list of vertex ids and I want these ids to be seen in the next superstep. Is it possible for me to just have the list in my BasicComputations class or do I need to send it to an aggregator or master class of some sort to store this list so that in the next superstep I can use this list of vertex ids in my basiccomputations class? kind regards, Ghufran
Re: Breadth First Search (BFS)
Also as to my original question I understand that I need to register an agg to a master class to store a persistent list. On Thu, Apr 3, 2014 at 2:13 PM, ghufran malik wrote: > Hi, > > Lukas, I believe the queue list is essential to the BFS algorithm I am > trying? I am following this explanation given here: > > https://www.youtube.com/watch?v=zLZhSSXAwxI > > So for my output I want to have the vertex id followed by the number > representing the order in which it was visited so vertex 5 was the start so > it has a value of 0, it has neighbours 4, 8 and 2, the order in which they > would be visited should be 2-4-8 so I want their values to be 1,2,3 > respectively. > > Nishant, Yes I've seen those implementations before they were done by a > student at my university some time ago, however for the BFSSO, at least, it > give the depth of each vertex and not result as presented in the above > video. Looking back at this algorithm though has brought into question how > I originally understood how the BSP model works in Giraph. I am confused on > how it works based on 1 fact, I thought every vertex was initially active > and run the compute method, so why isit that after superstep 0 in this > algorithm, every other vertex is not set to 1? > > I could be flawed in my understanding but It seems to me that this > algorithm for superstep 0 sets the depth (value) for the start vertex to 0, > that's fine, after that it sends a message to all vertices that are its > neighbours to activate them (according to the code comments)? I thought all > vertices where initially active until they vote to halt? > > Pregel Paper: > "In superstep 0, every vertex is in the active state; all > active vertices participate in the computation of any given > superstep. A vertex deactivates itself by voting to halt." > > He then sets every other vertex value that's not the start vertex to a max > integer. The supersteps after checks if the vertex value is of a max value > and if it is set the value to that superstep number. So superstep 1 all the > remaining vertices should be set to 1? - this assumption of mine I believe > is wrong but I need help in understanding why. > > this was my original understanding until I ran the code on the following > graph: > > [0,0,[[1,1],[3,3]]] > [1,0,[[0,1],[2,2],[3,1]]] > [2,0,[[1,2],[4,4]]] > [3,0,[[0,3],[1,1],[5,0]]] > [4,0,[[2,4]]] > [5,0,[[3,0]]] > > and got the output: > > 5 2.0 > 0 1.0 > 2 1.0 > 1 0.0 > 3 1.0 > 4 2.0 > > Note that the start vertex id was 1. So 1 is 0 thats done if the first > superstep. then messages are sent to its neighbours and it seems to me that > only these are active in superstep 1 so they are given a value 1 and > messages are sent to their neighbours (4 and 5). Why are 4 and 5 not active > in superstep 1 as well and given a value 1? Only in superstep 2 once these > vertices have received messages are they taken in to consideration and give > a value 2. > > So in conclusion why aren't all the vertices given a value 1 in superstep > 1 as I thought all vertices should be active? Unless all vertices are ONLY > active in superstep 0 after which ones that do not receive a message vote > to a halt? > > Sorry for long read, > > Thanks, > > Ghufran > > > > On Thu, Apr 3, 2014 at 11:50 AM, nishant gandhi > wrote: > >> Check this out for BFS.. >> >> >> http://stackoverflow.com/questions/12253794/breadth-first-implentation-in-giraph-graphchi-or-pregel >> >> Nishant Gandhi >> M.Tech CSE >> IIT Patna >> >> >> On Thu, Apr 3, 2014 at 3:18 PM, Lukas Nalezenec < >> lukas.naleze...@firma.seznam.cz> wrote: >> >>> Hi, >>> It looks like you are using wrong algorithm. If you are doing simple BFS >>> you should not need to remember vertex ids. >>> >>> Lukas >>> >>> Lukas >>> >>> >>> On 2.4.2014 20:30, ghufran malik wrote: >>> >>> Hi >>> >>> I am trying to implement the BFS algorithm using Giraph 1.1.0. >>> >>> I have partly implemented it and am stuck on just one part. I have a >>> list of vertex ids and I want these ids to be seen in the next superstep. >>> Is it possible for me to just have the list in my BasicComputations class >>> or do I need to send it to an aggregator or master class of some sort to >>> store this list so that in the next superstep I can use this list of vertex >>> ids in my basiccomputations class? >>> >>> kind regards, >>> >>> Ghufran >>> >>> >>> >> >
Re: Breadth First Search (BFS)
Hi, Lukas, I believe the queue list is essential to the BFS algorithm I am trying? I am following this explanation given here: https://www.youtube.com/watch?v=zLZhSSXAwxI So for my output I want to have the vertex id followed by the number representing the order in which it was visited so vertex 5 was the start so it has a value of 0, it has neighbours 4, 8 and 2, the order in which they would be visited should be 2-4-8 so I want their values to be 1,2,3 respectively. Nishant, Yes I've seen those implementations before they were done by a student at my university some time ago, however for the BFSSO, at least, it give the depth of each vertex and not result as presented in the above video. Looking back at this algorithm though has brought into question how I originally understood how the BSP model works in Giraph. I am confused on how it works based on 1 fact, I thought every vertex was initially active and run the compute method, so why isit that after superstep 0 in this algorithm, every other vertex is not set to 1? I could be flawed in my understanding but It seems to me that this algorithm for superstep 0 sets the depth (value) for the start vertex to 0, that's fine, after that it sends a message to all vertices that are its neighbours to activate them (according to the code comments)? I thought all vertices where initially active until they vote to halt? Pregel Paper: "In superstep 0, every vertex is in the active state; all active vertices participate in the computation of any given superstep. A vertex deactivates itself by voting to halt." He then sets every other vertex value that's not the start vertex to a max integer. The supersteps after checks if the vertex value is of a max value and if it is set the value to that superstep number. So superstep 1 all the remaining vertices should be set to 1? - this assumption of mine I believe is wrong but I need help in understanding why. this was my original understanding until I ran the code on the following graph: [0,0,[[1,1],[3,3]]] [1,0,[[0,1],[2,2],[3,1]]] [2,0,[[1,2],[4,4]]] [3,0,[[0,3],[1,1],[5,0]]] [4,0,[[2,4]]] [5,0,[[3,0]]] and got the output: 5 2.0 0 1.0 2 1.0 1 0.0 3 1.0 4 2.0 Note that the start vertex id was 1. So 1 is 0 thats done if the first superstep. then messages are sent to its neighbours and it seems to me that only these are active in superstep 1 so they are given a value 1 and messages are sent to their neighbours (4 and 5). Why are 4 and 5 not active in superstep 1 as well and given a value 1? Only in superstep 2 once these vertices have received messages are they taken in to consideration and give a value 2. So in conclusion why aren't all the vertices given a value 1 in superstep 1 as I thought all vertices should be active? Unless all vertices are ONLY active in superstep 0 after which ones that do not receive a message vote to a halt? Sorry for long read, Thanks, Ghufran On Thu, Apr 3, 2014 at 11:50 AM, nishant gandhi wrote: > Check this out for BFS.. > > > http://stackoverflow.com/questions/12253794/breadth-first-implentation-in-giraph-graphchi-or-pregel > > Nishant Gandhi > M.Tech CSE > IIT Patna > > > On Thu, Apr 3, 2014 at 3:18 PM, Lukas Nalezenec < > lukas.naleze...@firma.seznam.cz> wrote: > >> Hi, >> It looks like you are using wrong algorithm. If you are doing simple BFS >> you should not need to remember vertex ids. >> >> Lukas >> >> Lukas >> >> >> On 2.4.2014 20:30, ghufran malik wrote: >> >> Hi >> >> I am trying to implement the BFS algorithm using Giraph 1.1.0. >> >> I have partly implemented it and am stuck on just one part. I have a >> list of vertex ids and I want these ids to be seen in the next superstep. >> Is it possible for me to just have the list in my BasicComputations class >> or do I need to send it to an aggregator or master class of some sort to >> store this list so that in the next superstep I can use this list of vertex >> ids in my basiccomputations class? >> >> kind regards, >> >> Ghufran >> >> >> >
Re: Breadth First Search (BFS)
Check this out for BFS.. http://stackoverflow.com/questions/12253794/breadth-first-implentation-in-giraph-graphchi-or-pregel Nishant Gandhi M.Tech CSE IIT Patna On Thu, Apr 3, 2014 at 3:18 PM, Lukas Nalezenec < lukas.naleze...@firma.seznam.cz> wrote: > Hi, > It looks like you are using wrong algorithm. If you are doing simple BFS > you should not need to remember vertex ids. > > Lukas > > Lukas > > > On 2.4.2014 20:30, ghufran malik wrote: > > Hi > > I am trying to implement the BFS algorithm using Giraph 1.1.0. > > I have partly implemented it and am stuck on just one part. I have a > list of vertex ids and I want these ids to be seen in the next superstep. > Is it possible for me to just have the list in my BasicComputations class > or do I need to send it to an aggregator or master class of some sort to > store this list so that in the next superstep I can use this list of vertex > ids in my basiccomputations class? > > kind regards, > > Ghufran > > >
Re: Breadth First Search (BFS)
Hi, It looks like you are using wrong algorithm. If you are doing simple BFS you should not need to remember vertex ids. Lukas Lukas On 2.4.2014 20:30, ghufran malik wrote: Hi I am trying to implement the BFS algorithm using Giraph 1.1.0. I have partly implemented it and am stuck on just one part. I have a list of vertex ids and I want these ids to be seen in the next superstep. Is it possible for me to just have the list in my BasicComputations class or do I need to send it to an aggregator or master class of some sort to store this list so that in the next superstep I can use this list of vertex ids in my basiccomputations class? kind regards, Ghufran
Breadth First Search (BFS)
Hi I am trying to implement the BFS algorithm using Giraph 1.1.0. I have partly implemented it and am stuck on just one part. I have a list of vertex ids and I want these ids to be seen in the next superstep. Is it possible for me to just have the list in my BasicComputations class or do I need to send it to an aggregator or master class of some sort to store this list so that in the next superstep I can use this list of vertex ids in my basiccomputations class? kind regards, Ghufran
Re: Breadth-first search
Dear Avery, Regarding this decision about resource allocation, do you have a methodology or a rule of thumb that helps you decide which setting is expected to perform well? For example, with a given input (number of graph vertices), can you estimate what number of workers and how much memory per worker would be optimal? Or the other way around: given a pool of resources (cores & memory), what's a reasonable graph size? That insight would be really interesting. Thanks, Alexandros On 11 December 2012 19:40, Avery Ching wrote: > We are running several Giraph applications in production using our version > of Hadoop (Corona) at Facebook. The part you have to be careful about is > ensuring you have enough resources for your job to run. But otherwise, we > are able to run at FB-scale (i.e. 1billion+ nodes, many more edges). > > Avery > > > On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: > >> Hi: >> >> I implemented a graph algorithm to recommend content to our users. >> Although it is working (implementation uses Mahout) it very inefficient >> because I have to run many iterations in order to perform a breadth-first >> search on my graph. >> I would like to use Giraph for that task. I would like to know if it is >> production ready. I'm running jobs on Amazon EMR. >> >> Thanks in advance. >> Gustavo >> > >
Re: Breadth-first search
Hi Gustavo, If your graph fits in memory, you might be interested Green-Marl, a language tailored for graph processing: https://github.com/stanford-ppl/Green-Marl You can compile your Green-Marl program to an extremely fast C++ program, but also to Giraph program when your graph does not fit in memory anymore. - Jan On Tue, Dec 11, 2012 at 8:33 PM, Gustavo Enrique Salazar Torres < gsala...@ime.usp.br> wrote: > Hi Avery: > > Regarding resources I guess I won't need that much, our graph has 60,000 > nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if > needed. > > Thank you very much. > Gustavo > > On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching wrote: > >> We are running several Giraph applications in production using our >> version of Hadoop (Corona) at Facebook. The part you have to be careful >> about is ensuring you have enough resources for your job to run. But >> otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many more >> edges). >> >> Avery >> >> >> On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: >> >>> Hi: >>> >>> I implemented a graph algorithm to recommend content to our users. >>> Although it is working (implementation uses Mahout) it very inefficient >>> because I have to run many iterations in order to perform a breadth-first >>> search on my graph. >>> I would like to use Giraph for that task. I would like to know if it is >>> production ready. I'm running jobs on Amazon EMR. >>> >>> Thanks in advance. >>> Gustavo >>> >> >> > > > >
Re: Breadth-first search
Hi Avery: Regarding resources I guess I won't need that much, our graph has 60,000 nodes only, I believe one c1.xlarge EC2 machine can handle this or scale if needed. Thank you very much. Gustavo On Tue, Dec 11, 2012 at 4:40 PM, Avery Ching wrote: > We are running several Giraph applications in production using our version > of Hadoop (Corona) at Facebook. The part you have to be careful about is > ensuring you have enough resources for your job to run. But otherwise, we > are able to run at FB-scale (i.e. 1billion+ nodes, many more edges). > > Avery > > > On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: > >> Hi: >> >> I implemented a graph algorithm to recommend content to our users. >> Although it is working (implementation uses Mahout) it very inefficient >> because I have to run many iterations in order to perform a breadth-first >> search on my graph. >> I would like to use Giraph for that task. I would like to know if it is >> production ready. I'm running jobs on Amazon EMR. >> >> Thanks in advance. >> Gustavo >> > >
Re: Breadth-first search
We are running several Giraph applications in production using our version of Hadoop (Corona) at Facebook. The part you have to be careful about is ensuring you have enough resources for your job to run. But otherwise, we are able to run at FB-scale (i.e. 1billion+ nodes, many more edges). Avery On 12/11/12 5:58 AM, Gustavo Enrique Salazar Torres wrote: Hi: I implemented a graph algorithm to recommend content to our users. Although it is working (implementation uses Mahout) it very inefficient because I have to run many iterations in order to perform a breadth-first search on my graph. I would like to use Giraph for that task. I would like to know if it is production ready. I'm running jobs on Amazon EMR. Thanks in advance. Gustavo
Breadth-first search
Hi: I implemented a graph algorithm to recommend content to our users. Although it is working (implementation uses Mahout) it very inefficient because I have to run many iterations in order to perform a breadth-first search on my graph. I would like to use Giraph for that task. I would like to know if it is production ready. I'm running jobs on Amazon EMR. Thanks in advance. Gustavo