+1 for Tommaso's solution. If not every algorithm needs counter service, having an interface with different implementations (in-memory, zk, etc.) should reduce the side effect.
On 15 July 2013 15:51, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > what about introducing a proper API for counting vertices, something like > an interface VertexCounter with 2-3 implementations like > InMemoryVertexCounter (basically the current one), a > DistributedVertexCounter to implement the scenario where we use a separate > BSP superstep to count them and a ZKVertexCounter which handles vertices > counts as per Chian-Hung's suggestion. > > Also we may introduce something like a configuration variable to define if > all the vertices are needed or just the neighbors (and/or some other > strategy). > > My 2 cents, > Tommaso > > 2013/7/14 Chia-Hung Lin <cli...@googlemail.com> > >> Just my personal viewpoint. For small size of global information, >> considering to store the state in ZooKeeper might be a reasonable >> solution. >> >> On 13 July 2013 21:28, andronat_asf <andronat_...@hotmail.com> wrote: >> > Hello everyone, >> > >> > I'm working on HAMA-767 and I have some concerns on counters and >> scalability. Currently, every peer has a set of vertices and a variable >> that is keeping the total number of vertices through all peers. In my case, >> I'm trying to add and remove vertices during the runtime of a job, which >> means that I have to update all those variables. >> > >> > My problem is that this is not efficient because in every operation (add >> or remove a vertex) I need to update all peers, so I need to send lots of >> messages to make those updates (see GraphJobRunner#countGlobalVertexCount >> method) and I believe this is not correct and scalable. An other problem is >> that, even if I update all those variable (with the cost of sending lots of >> messages to every peer) those variables will be updated on the next >> superstep. >> > >> > e.g.: >> > >> > Peer 1: Peer 2: >> > Vert_1 Vert_2 >> > (Total_V = 2) (Total_V = 2) >> > addVertex() >> > (Total_V = 3) >> > getNumberOfV() => 2 >> > >> > ------------------------ Sync ------------------------ >> > >> > getNumberOfV() => 3 >> > >> > >> > Is there something like global counters or shared memory that it can >> address this issue? >> > >> > P.S. I have a small feeling that we don't need to track the total amount >> of vertices because vertex centered algorithms rarely need total numbers, >> they only depend on neighbors (I might be wrong though). >> > >> > Thanks, >> > Anastasis >>