[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115239#comment-13115239 ] Avery Ching commented on GIRAPH-12: --- Are hadoop metrics better than simply using Runtime? We do this here: https://github.com/apache/giraph/blob/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java#L559 Or perhaps http://download.oracle.com/javase/6/docs/api/java/lang/management/package-summary.html? I haven't used it, but it's been suggested on stack overflow. http://download.oracle.com/javase/6/docs/api/java/lang/management/MemoryMXBean.html > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114709#comment-13114709 ] Hyunsik Choi commented on GIRAPH-12: I have thought about question 3. That is, how we can measure the memory usage while Giraph is running. Probably, the most basic way is to use the hadoop metrics (http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs to change _hadoop-metrics.properties_ file. So, it may be restricted for most large clusters; e.g., Yahoo! cluster that Avery can access. If the above way is impossible, we can implement a thread class mimic to hadoop metric in order to measure the memory usage on JVM periodically and sends that to a specific remote server. What do you think about that? > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114646#comment-13114646 ] Hyunsik Choi commented on GIRAPH-12: I'm sorry too for late response. I was out of town due to my personal work. I just come to home. The previous experiments are too simple. Actually, that experiment cannot show any meaningful result. I sorry for that. As to the question 3, this issue was originated from the memory usage. I should have measured the memory usage. Sooner, I'll answer your 3 questions :) > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira