[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424825#comment-13424825
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

We can provide two options to the user:
1. Use standard Netty messaging between workers and master to send aggregated 
values.
2. Master writes aggregated values to HDFS. User should use this option if his 
aggregated values are very large objects.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424862#comment-13424862
 ] 

Gianmarco De Francisci Morales commented on GIRAPH-273:
---

Hi,
good idea.
Regarding to option 2, why would this be better than option 1 for large files?
In any case the files would need to be read/written to the network, and on HDFS 
they would also be replicated.
I don't think HDFS is a good place for temporary files.

I guess the best way to implement aggregators would be a Dremel like solution 
with aggregation trees, so that you can reduce the pressure on the master while 
at the same time keep the latency low. (but maybe for only hundreds of machines 
this is overkill).

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424895#comment-13424895
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

For 2, I thought it would be better since we would have two phase distribution, 
master wouldn't send the message to every worker. 

Dremel idea sounds interesting, my suggestion is to implement it in a simple 
way first, and if the need arises in the future we can look into it then.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424964#comment-13424964
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

It seems Pregel already does tree based reduction - from Pregel paper:
"At the end of the superstep workers form a tree to reduce partially reduced 
aggregators into global values and deliver them to the master."

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424979#comment-13424979
 ] 

Gianmarco De Francisci Morales commented on GIRAPH-273:
---

Another reason to go for it, then.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425089#comment-13425089
 ] 

Jakob Homan commented on GIRAPH-273:


Writing to HDFS is not a good idea. 

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-30 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425541#comment-13425541
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

Jakob, can you share the reasons for that? (I'm just trying to learn) Thanks

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425610#comment-13425610
 ] 

Avery Ching commented on GIRAPH-273:


I think that as an option, writing to HDFS should be fine, but the default 
should be in-memory, as writing to HDFS is likely to be a bit slow.  Again, 
moving this out of Zookeeper should improve our scalability a lot, even with 
say 100k aggregators, this shouldn't be an issue (assuming they are small 
objects).  The master doesn't require a lot of memory for other things, so 
keeping it in memory should be fine.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425681#comment-13425681
 ] 

Gianmarco De Francisci Morales commented on GIRAPH-273:
---

IMHO, even as an option it does not make much sense.
What is the advantage of persisting (and replicating) aggregators on disk?
Especially if they are many and small, HDFS is the worst place.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425936#comment-13425936
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

Ok, I'll implement it using just messaging then.

Do we have the address of master stored somewhere on the worker?

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-07-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426261#comment-13426261
 ] 

Avery Ching commented on GIRAPH-273:


I believe not, since the workers always communicated to the master using 
ZooKeeper in the past.  See BspServiceMaster#becomeMaster(), which describes 
the master protocol to see if it is the master.  BspService#masterElectionPath 
contains the location to the actual master, just need to factor out the common 
code to get the master address out of BspServiceMaster#becomeMaster() and 
perhaps put it in BspService.  Note, you'll also have to add a few new methods 
for communication to the master as none exist currently and start up a Netty 
server there as well.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-23 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440637#comment-13440637
 ] 

Eli Reisman commented on GIRAPH-273:


I like the master connection, in-memory option. Avery is right, memory pressure 
on the master is low since it doesn't do much work during the super steps. This 
sounds like a great plan, and should be fairly quick in the super steps too. 
The aggregation tree is a great idea because we can piggyback aggregation 
connections on existing worker connections for many levels up the tree and 
hopefully only 2 workers will actually ever need to add extra connections to 
talk to the master.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-23 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440675#comment-13440675
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

We actually ended up with something better than aggregation tree. 

Say we have A aggregators and W workers. With the tree approach the whole 
aggregation would last for:
A * (aggregation_time + transfer_time) * log W
What we can do is perform aggregations in a completely distributed way. Each 
aggregator would have a worker which owns it and which does aggregation for it, 
so we would end up with about:
A * (aggregation_time + transfer_time)
After performing aggregations, all workers would send the final values to 
master, and after master.compute aggregators would go back the same way. In 
case of applications without master compute, we can even skip sending 
aggregated values to master all together. 

Is having all the workers connect to master an issue? Master will have the same 
number of connections as any other worker has, and in this approach we just 
send smaller amount of data through each of the connections, instead of having 
that same amount sent through just two.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445738#comment-13445738
 ] 

Eli Reisman commented on GIRAPH-273:


The tree sounds a lot better to me. What is the case where an aggregator is a 
large chunk of data, won't it mostly be a counter, numerical value, or other 
fixed-size datatype that is augmented or altered by the values at each worker? 

As Netty has added more features, I have already seen memory and network issues 
start to build up as we scale to 1000's of worker nodes, which is definitely 
our use case here, so that many more connections/traffic on the network kind of 
scares me. I hear what you're saying about the master not having much to do 
right now though.

If you have some use cases where aggregators accumulate instead of aggregate 
then maybe havign all the extra network connections would be worth it. Log W 
time on a small message passed around that way seems ok to me.

If Pregel likes it, Eli likes it. ;)





> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445839#comment-13445839
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

One of the applications we are working on is having huge maps as aggregator 
values.

This approach is only adding one more connection per worker, and making master 
have the same number of connections workers are having. I don't see that as a 
significant change, right now we have O(W^2) connections in the system and we 
are adding just O(W).

I was planning on adding some additional option (as default) which would be 
used in case when we have just a few small aggregators (for example having one 
worker own all the aggregators). But for the big aggregators case, I think the 
way described above is better than tree approach.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446099#comment-13446099
 ] 

Eli Reisman commented on GIRAPH-273:


That makes a lot of sense, maybe thats the right way to go then. What I was 
thinking (just for reference) is along these lines:

We aggregate all values in aggregators at each worker during compute() cycle, 
so really we have total messages per aggregation at each superstep come to (# 
of workers) * (# of aggregators in that application.)

We set a single ZK node at the end of each superstep that the master creates 
once all workers have put up nodes to say they are done with that superstep. 
When this new node appears, workers start sending their aggregated values from 
that superstep. They have their own Worker ID number already, and they can get 
"-w" (the total workers in the application run) from Configuration. So then 
they have a sort of heap swim() function that takes these two values and gives 
back their parent node in the tree. Since all other network activity has ceased 
for a moment, passing messages along should not be too expensive compared to 
the volumes we send during the work phases of a superstep already. If they get 
aggregator messages, they pass them to parent. It gets a bit busy at the top of 
the tree, but even then our typical messaging should be much more volume so it 
ought to be ok if we got this far without crashing already?

So...at the top worker 1, 2 report to the top of the heap, which is the master 
(worker 0) and that is where all the final aggregating takes place, since the 
master has nothing to do. Alternately, the top couple nodes in the tree (as 
determined by their height in the tree) might do some sub aggregating to cut 
down on message volume. This could be set up to whatever tests the best 
(probably some sub aggregating)

Finally, when the master gets (# of workers) * (# of aggregators) values (or 
with sub-agg, 2 messages * # of aggregators) then it writes to that znode a 
child that says "time to move on with the superstep" and we go forward. If we 
pass a timeout without hearing from everyone, retry or app fail etc.

This means no new connections except a single one to master from nodes 1 & 2 
which is nice. We would love to scale up further into the 4 figures and the # 
of connections maintained per worker is starting to become a bit of a problem. 
I definitely agree doing the work at the master when possible for aggregators 
(as with the 1 connect from each worker method) is good because the master is 
not busy in our current scheme.

Anyway, great work on the other sections so far, thats a lot of code to write! 
Looking forward to this!


> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446215#comment-13446215
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

I still don't see how constant difference of one connection per worker can make 
a problem, if the problem wasn't already there. I do agree that in most of the 
applications the approach which I described is not needed, but those are the 
cases in which even storing aggregators on ZooKeeper was working fine, and 
basically what ever we do won't matter much. 

I already have the implementation for this, there are just a few smaller bugs I 
have to fix, and also I have to wait for RPC to be removed first (I didn't want 
to leave a big mess with two completely different code paths). In my 
implementation there is no need for another barrier before sending aggregators, 
since they can be aggregated on the worker owner even before the computation 
there is done. And worker waits to receive aggregated values from all the 
others before sending them to master.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446231#comment-13446231
 ] 

Avery Ching commented on GIRAPH-273:


One extra connection per worker should definitely be fine.  I want to note that 
we while distributed aggregators solves the issue of having a master be a 
single point of contention (and memory), we can still implement a tree based 
solution on top of that.  One does not preclude the other.  

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#comment-13446301
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

That's true, we can implement several different approaches and decide which one 
to use based on the current application needs.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-09-03 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447414#comment-13447414
 ] 

Eli Reisman commented on GIRAPH-273:


Wanted to add something: rereading the thread here, I think I can address the 
HDFS concern from earlier: the NameNode unduly suffers when HDFS is loaded up 
with lots of tiny files.


> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-10-19 Thread Maja Kabiljo (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480609#comment-13480609
 ] 

Maja Kabiljo commented on GIRAPH-273:
-

https://reviews.apache.org/r/7673/

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-10-19 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480652#comment-13480652
 ] 

Avery Ching commented on GIRAPH-273:


Thanks for the work Maja, this is a great improvement to allow any number of 
aggregators as well as good performance for large aggregators.  Anyone have any 
objections?

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-10-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484499#comment-13484499
 ] 

Avery Ching commented on GIRAPH-273:


+1.  Let's commit this at the end of day if no one has anything else to say.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-273.diff
>
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-10-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484623#comment-13484623
 ] 

Avery Ching commented on GIRAPH-273:


End of day is here.  Please commit.

> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-273.diff
>
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484631#comment-13484631
 ] 

Hudson commented on GIRAPH-273:
---

Integrated in Giraph-trunk-Commit #255 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/255/])
GIRAPH-273: Aggregators shouldn't use Zookeeper (Revision 1402363)

 Result = SUCCESS
maja : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1402363
Files : 
* /giraph/trunk/CHANGELOG
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/AggregatorsBenchmark.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/MasterClient.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/ServerData.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/AggregatedValueOutputStream.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/AggregatorOutputStream.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/AggregatorUtils.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/AllAggregatorServerData.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/CountingCache.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/CountingOutputStream.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/OwnerAggregatorServerData.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/SendAggregatedValueCache.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/SendAggregatorCache.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/WorkerAggregatorRequestProcessor.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/aggregators/package-info.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyMasterClientServer.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerAggregatorRequestProcessor.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/handler/MasterRequestServerHandler.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/ByteArrayRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/MasterRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/RequestType.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendAggregatorsToMasterRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendAggregatorsToOwnerRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendAggregatorsToWorkerRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendWorkerAggregatorsRequest.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/AggregatorsTestVertex.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/AggregatorHandler.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspService.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/MasterAggregatorHandler.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/WorkerAggregatorHandler.java
* 
/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExpectedBarrier.java
* 
/giraph/trunk/giraph/src/test/java/org/apache/giraph/graph/TestAggregatorsHandling.java


> Aggregators shouldn't use Zookeeper
> ---
>
> Key: GIRAPH-273
> URL: https://issues.apache.org/jira/browse/GIRAPH-273
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Attachments: GIRAPH-273.diff
>
>
> We use Zookeeper znodes to transfer aggregated values from workers to master 
> and back. Zookeeper is supposed to be used for coordination, and it also has 
> a memory limit which prevents users from having aggregators with large value 
> objects. These are the reasons why we should implement aggregators gathering 
> and distribution in a different way.

--
This message is automatically generated by JIRA.
If you think it w

Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-08-31 Thread Eli Reisman
With 2000 workers, thats 2000 extra connections in the system. We run
Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
network resources are sometimes at a premium. These jobs are often running
on the same boxes as our worker mappers are running, and the scheduling is
not under our control or particularly suited to Giraph. I'm not too
familiar with the aggregator code but it seems like if you have an idea for
an implementation that doesn't use a barrier, I agree with Avery that this
doesn't preclude the tree option in that scenario either.

On the other hand, if you have a specialized use case, maybe the easiest
thing would be to do what it takes to make your map aggregator work however
you like and have it be command-line optional, and just leave the existing
ZK implementation in place for the rest of the use cases. Have you had
problems with needing more standard aggregators and ZK nodes not holding
enough data, or is this map aggregator driving your need for this feature?
Can I ask what algorithm you're implementing that requires a globally
aggregated map at every superstep? Have you guys noticed performance or
speed issues with the existing ZK implementation as you add aggregators to
an application?

Anyway I'm not firmly for or against any of this stuff, just curious. If
you find an implementation that works for you that sounds great. If it was
optional with the existing version or the tree available, that would
probably save us some headache here when we share a cluster (which is
almost all the time.)

On Fri, Aug 31, 2012 at 12:55 PM, Maja Kabiljo (JIRA) wrote:

>
> [
> https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#comment-13446301]
>
> Maja Kabiljo commented on GIRAPH-273:
> -
>
> That's true, we can implement several different approaches and decide
> which one to use based on the current application needs.
>
> > Aggregators shouldn't use Zookeeper
> > ---
> >
> > Key: GIRAPH-273
> > URL: https://issues.apache.org/jira/browse/GIRAPH-273
> > Project: Giraph
> >  Issue Type: Improvement
> >Reporter: Maja Kabiljo
> >Assignee: Maja Kabiljo
> >
> > We use Zookeeper znodes to transfer aggregated values from workers to
> master and back. Zookeeper is supposed to be used for coordination, and it
> also has a memory limit which prevents users from having aggregators with
> large value objects. These are the reasons why we should implement
> aggregators gathering and distribution in a different way.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>


Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-09-01 Thread Maja Kabiljo
In the case you mentioned you already have a million connections, that's
why I don't see how 2k of them make a difference. Maybe I'm missing
something here.

The reason why this can be done without additional barrier is that
aggregated values which we receive from other workers can be treated in
the same way we treat values given in vertex.compute - we can just
aggregate them right away. Should be doable with the tree approach also -
we can send the values as soon as we are done with the computation and we
received values from our child in the tree, if any.

I guess we can also leave current implementation as one of the options,
that didn't occur to me, thanks. Since aggregators are written to the same
znode as some other data, that should be the least possible overhead for
cases with just a few simple value aggregators.

I'm not sure is the performance affected when a bit of aggregators are
added (another guy in the team is working on the application), but I don't
think we can get that far to notice it because of ZooKeeper memory limit.
Avery, can you take the question about our application? (I'm not sure what
are we allowed to share publicly what not :-))



On 9/1/12 12:25 AM, "Eli Reisman"  wrote:

>With 2000 workers, thats 2000 extra connections in the system. We run
>Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
>network resources are sometimes at a premium. These jobs are often running
>on the same boxes as our worker mappers are running, and the scheduling is
>not under our control or particularly suited to Giraph. I'm not too
>familiar with the aggregator code but it seems like if you have an idea
>for
>an implementation that doesn't use a barrier, I agree with Avery that this
>doesn't preclude the tree option in that scenario either.
>
>On the other hand, if you have a specialized use case, maybe the easiest
>thing would be to do what it takes to make your map aggregator work
>however
>you like and have it be command-line optional, and just leave the existing
>ZK implementation in place for the rest of the use cases. Have you had
>problems with needing more standard aggregators and ZK nodes not holding
>enough data, or is this map aggregator driving your need for this feature?
>Can I ask what algorithm you're implementing that requires a globally
>aggregated map at every superstep? Have you guys noticed performance or
>speed issues with the existing ZK implementation as you add aggregators to
>an application?
>
>Anyway I'm not firmly for or against any of this stuff, just curious. If
>you find an implementation that works for you that sounds great. If it was
>optional with the existing version or the tree available, that would
>probably save us some headache here when we share a cluster (which is
>almost all the time.)
>
>On Fri, Aug 31, 2012 at 12:55 PM, Maja Kabiljo (JIRA)
>wrote:
>
>>
>> [
>> 
>>https://issues.apache.org/jira/browse/GIRAPH-273?page=com.atlassian.jira.
>>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446301#c
>>omment-13446301]
>>
>> Maja Kabiljo commented on GIRAPH-273:
>> -
>>
>> That's true, we can implement several different approaches and decide
>> which one to use based on the current application needs.
>>
>> > Aggregators shouldn't use Zookeeper
>> > ---
>> >
>> > Key: GIRAPH-273
>> > URL: https://issues.apache.org/jira/browse/GIRAPH-273
>> > Project: Giraph
>> >  Issue Type: Improvement
>> >Reporter: Maja Kabiljo
>> >Assignee: Maja Kabiljo
>> >
>> > We use Zookeeper znodes to transfer aggregated values from workers to
>> master and back. Zookeeper is supposed to be used for coordination, and
>>it
>> also has a memory limit which prevents users from having aggregators
>>with
>> large value objects. These are the reasons why we should implement
>> aggregators gathering and distribution in a different way.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see:
>>http://www.atlassian.com/software/jira
>>



Re: [jira] [Commented] (GIRAPH-273) Aggregators shouldn't use Zookeeper

2012-09-01 Thread Eli Reisman
Hey if your application is very specific and internal then nevermind I
don't need to know ;) I am not entirely familiar with the aggregator code
but if it is placed in the superstep cycle currently in a way that avoids a
barrier, that sounds terrific. If the ZK is just too small then some kind
of networking solution sounds better, especially if you've already taken
the time to write it!

We are in a situation where not only are some of our worker tasks on the
same box, but they are sharing resources with other mappers and tasks from
other kinds of MR jobs all the time, so there's just a lot of resources in
use per box that any Giraph user here will have 0 control over, and Giraph
is not very tolerant to mid-job changes in activity levels of the cluster
as of now. We also plan on scaling to as many workers as possible to
parallelize the work and spread the load. So for all these reasons any way
to avoid extra connections when there's no down side seems good to me.

It isn't the end of the world to make all the new connections, but it
didn't seem needed to make aggregators work as I understood the reason for
aggregators. Thats why I asked about the nature of the algorithm you guys
are implementing.

I guess what it all boils down to is what is the purpose of an aggregator.
The tree and ZK are no good if you're sending large amounts of data. I was
under the assumption this feature was a "reducer" in that you aggregate a
single result from many data points during the compute cycle, and pass it
up the chain until all results from all workers are aggregated to single
value at the master for global consumption. If you need to pass a map
through this system, maybe that data needs to be reduced at the
worker.compute() level before traversing the network, or maybe the
aggregator system is not ideal for use that case?

If its the only way to make your application work, then maybe what we're
doing is expanding the contract of what an aggregator is into a shared
global data store at the master? I think thats where I was confused about
needing an all-workers-to-master back channel just for aggregators. If
there's no way to use reduction-style aggregation to make the algorithm
work then it sounds like this needs to be done.

On Sat, Sep 1, 2012 at 12:22 AM, Maja Kabiljo  wrote:

> In the case you mentioned you already have a million connections, that's
> why I don't see how 2k of them make a difference. Maybe I'm missing
> something here.
>
> The reason why this can be done without additional barrier is that
> aggregated values which we receive from other workers can be treated in
> the same way we treat values given in vertex.compute - we can just
> aggregate them right away. Should be doable with the tree approach also -
> we can send the values as soon as we are done with the computation and we
> received values from our child in the tree, if any.
>
> I guess we can also leave current implementation as one of the options,
> that didn't occur to me, thanks. Since aggregators are written to the same
> znode as some other data, that should be the least possible overhead for
> cases with just a few simple value aggregators.
>
> I'm not sure is the performance affected when a bit of aggregators are
> added (another guy in the team is working on the application), but I don't
> think we can get that far to notice it because of ZooKeeper memory limit.
> Avery, can you take the question about our application? (I'm not sure what
> are we allowed to share publicly what not :-))
>
>
>
> On 9/1/12 12:25 AM, "Eli Reisman"  wrote:
>
> >With 2000 workers, thats 2000 extra connections in the system. We run
> >Giraph/Netty on the same cluster as existing jobs that use Hadoop RPC so
> >network resources are sometimes at a premium. These jobs are often running
> >on the same boxes as our worker mappers are running, and the scheduling is
> >not under our control or particularly suited to Giraph. I'm not too
> >familiar with the aggregator code but it seems like if you have an idea
> >for
> >an implementation that doesn't use a barrier, I agree with Avery that this
> >doesn't preclude the tree option in that scenario either.
> >
> >On the other hand, if you have a specialized use case, maybe the easiest
> >thing would be to do what it takes to make your map aggregator work
> >however
> >you like and have it be command-line optional, and just leave the existing
> >ZK implementation in place for the rest of the use cases. Have you had
> >problems with needing more standard aggregators and ZK nodes not holding
> >enough data, or is this map aggregator driving your need for this feature?
> >Can I ask what algorithm you're implementing that requires a globally
> >aggregated map at every superstep? Have you guys noticed performance or
> >speed issues with the existing ZK implementation as you add aggregators to
> >an application?
> >
> >Anyway I'm not firmly for or against any of this stuff, just curious. If
> >you find an implementation that wo