[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyunsik Choi updated GIRAPH-12: ------------------------------- Attachment: GIRAPH-12_1.patch As Avery mentioned, in the current architecture, each worker requires N threads that communicate with N remote peers. This may incur severe context-switching overheads (especially when all messages are flushed) and more memory consumption. Firstly, I considered about replacing RPC system to another one. However, it is not simple work. I need more time. Instead, I have considered an alternative way to employ ThreadPoolExecutor in order to adjust active threads. When Giraph deals with large graphs, the performance of Giraph is usually bounded on network bandwidth. I think that this approach would be effective. In addition, I tried to reduce the synchronization area, where BasicRPCCommunicator (374-394 lines) sends large buffered messages to specific peers. I attached the patch in progress. Now, I cannot access to real hadoop cluster for one week. I didn't test this in real cluster. Besides, all unit test are passed. How about this approach? Could you review this? > Investigate communication improvements > -------------------------------------- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Reporter: Avery Ching > Assignee: Hyunsik Choi > Priority: Minor > Attachments: GIRAPH-12_1.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira