First of all, Our main problem is that current system requires a lot of memory space, especially graph module. As you already might know, the main memory consumer is the message queue.
To solve this problem, we considered the use of local disk space e.g., DiskQueue and SpillingQueue. However, those queues are basically not able to bundle and group the messages by destination server, in memory-efficient way. So, I don't think this approach is right way. My solution for saving the memory usage and the performance degradation, is storing serializable message objects as a byte array in queue. In graph case, 3X ~ 6X memory efficiency is expected than before (GraphJobMessage consists of destination vertex ID and message value multi-objects). In 0.6.4, Outgoing queue is replaced with outgoing bundles manager, and it showed nice memory improvement. Now I wanna start refactoring of incoming queue. My plan is that adding incoming bundles manager. Bundles can also simply be written to local disk if when memory space is not enough. So, incoming bundles manager can be performed a similar role of DiskQueue and SpillingQueue in the future. If you have any other opinion, Please let me know. If there are no objections, I'll do it. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.
