> Q1: Is Hama going to participate in GSOC 2016 ? Sure, why not?
> Q2: In the image below, I see an interesting behavior of Hama but I am not sure why the behavior is like this. Can you tell us what version you used? I roughly guess master task can receive incoming message bundles concurrently if number of tasks is large. -- Best Regards, Edward J. Yoon -----Original Message----- From: Behroz Sikander [mailto:bsikan...@apache.org] Sent: Tuesday, January 19, 2016 12:28 AM To: dev@hama.apache.org Subject: Question regarding Hama synchronization behavior and GSOC Hi, I have 2 questions regarding Hama. Q1: Is Hama going to participate in GSOC 2016 ? Q2: In the image below, I see an interesting behavior of Hama but I am not sure why the behavior is like this. http://imgur.com/cVsfL1x On x-axis, I have the total number of data that I need to process. On y-axis, I have the time in minutes which is aggregated over 200 iterations. Each line in plot represent different number of Hama tasks (Peers) used to process the data. Overall this plot is showing the *total time that master task waits for slave tasks to synchronize (*for* 200 iterations *in* minutes).* Note: 1) total time master waits for slaves in *1* *iteration* = (time of slave processing) + *(time of synchronization)* The plot is only showing the *time in synchronization* aggregated over *200 iterations*. I am using this plot to study the time taken by Hama in synchronization. 2) The total data is divided among all the tasks equally. For example, if I am using 10 tasks to process 10K data, then each task will get 1000. If i use 20 tasks to process 10K, then each will have 500. Now in the plot for example, blue line represents 10 tasks. If I process 10,000 files in 200 iterations the master waits for almost 3 minutes for slaves to synchronize. Now if you look closely, then if I *increase* the *number of tasks* to process the data, the *time* of master waiting for *slaves to synchronization* starts to *decrease*. For example, look at the points on 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits for only ~6 minutes and for 50 tasks, it took ~4mins. Q: My question is that how to interpret this information ? The answer that I came up is that the *outgoing message queue* of tasks is smaller in case I use more tasks to process and bigger in case I have less tasks. For example, If a task has to send 1000 messages to master then its outgoing queue will be bigger and will take more time to send as compared to task with 500 outgoing messages. So, is my interpretation correct or something else is going on here ?Any insight would be helpful. Regards, Behroz