Hi, *> Q1: Is Hama going to participate in GSOC 2016 ? * *Sure, why not?*
-->Great. I am willing to participate in this GSOC. Do we already have some potential projects ? Jira does not seem to have any. *>> Q2: In the image below, I see an interesting behavior of Hama but I am not sure why the behavior is like this. Can you tell us what version you used? I roughly guess master task can receive incoming message bundles concurrently if number of tasks is large.* --> I am using 0.7.0. Ok but can a slave send concurrent message to master if the queue is large ? because it seems that if the outgoing queue is large on slaves then they will take more time. Regards, Behroz On Tue, Jan 19, 2016 at 1:59 AM, Edward J. Yoon <[email protected]> wrote: > > Q1: Is Hama going to participate in GSOC 2016 ? > > Sure, why not? > > > Q2: In the image below, I see an interesting behavior of Hama but I am > not > sure why the behavior is like this. > > Can you tell us what version you used? > > I roughly guess master task can receive incoming message bundles > concurrently > if number of tasks is large. > > -- > Best Regards, Edward J. Yoon > > -----Original Message----- > From: Behroz Sikander [mailto:[email protected]] > Sent: Tuesday, January 19, 2016 12:28 AM > To: [email protected] > Subject: Question regarding Hama synchronization behavior and GSOC > > Hi, > I have 2 questions regarding Hama. > > Q1: Is Hama going to participate in GSOC 2016 ? > > Q2: In the image below, I see an interesting behavior of Hama but I am not > sure why the behavior is like this. > > http://imgur.com/cVsfL1x > > On x-axis, I have the total number of data that I need to process. On > y-axis, I have the time in minutes which is aggregated over 200 iterations. > Each line in plot represent different number of Hama tasks (Peers) used to > process the data. Overall this plot is showing the *total time that master > task waits for slave tasks to synchronize (*for* 200 iterations *in* > minutes).* > > Note: > 1) total time master waits for slaves in *1* *iteration* = (time of slave > processing) + > *(time of synchronization)* > The plot is only showing the *time in synchronization* aggregated over *200 > iterations*. I am using this plot to study the time taken by Hama in > synchronization. > > 2) The total data is divided among all the tasks equally. For example, if I > am using 10 tasks to process 10K data, then each task will get 1000. If i > use 20 tasks to process 10K, then each will have 500. > > Now in the plot for example, blue line represents 10 tasks. If I process > 10,000 files in 200 iterations the master waits for almost 3 minutes for > slaves to synchronize. > > Now if you look closely, then if I *increase* the *number of tasks* to > process the data, the *time* of master waiting for *slaves to > synchronization* starts to *decrease*. For example, look at the points on > 50K data, for 30 tasks master waits for ~10 minutes, for 40 tasks it waits > for only ~6 minutes and for 50 tasks, it took ~4mins. > > Q: My question is that how to interpret this information ? > The answer that I came up is that the *outgoing message queue* of tasks is > smaller in case I use more tasks to process and bigger in case I have less > tasks. For example, If a task has to send 1000 messages to master then its > outgoing queue will be bigger and will take more time to send as compared > to task with 500 outgoing messages. So, is my interpretation correct or > something else is going on here ?Any insight would be helpful. > > Regards, > Behroz > > >
