[ 
https://issues.apache.org/jira/browse/HAMA-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929751#action_12929751
 ] 

Filipe Manana commented on HAMA-282:
------------------------------------

Hi Edward,

Thanks for looking into it.
Yes, the failling test went unnoticed before the addition of the 
GroboTestingJUnit library.

About the line:

superstepCounter -= tip.getTaskStatus(status.getTaskId()).getSuperstepCount();

My reasoning was the following:

A job in progress has one or more tasks (1 task per peer/groom server), which 
means the job's superstep count matches the sum of all its tasks' superstep 
counts. Right?
TaskInProgress seems to represent more than one task (the current task attempt 
and previous (failed) attempts, like Hadoop MapReduce). So I thought:

In getSuperstepCounter() of JobInProgress, iterate over the list of 
TaskInProgress, get the TaskStatus of each TaskInProgress, get the superstep 
counter from each TaskStatus and sum them all.
The problem with this is that in order to get the TaskStatus of a 
TaskInProgress, I need to supply the TaskStatus ID, which I only seem to know 
when JobInProgress.updateTaskStatus() is invoked.
Also, I thought that to update the superstep count of the job, I need to 
decrement it by the number of supersteps of the previous TaskStatus and then 
increment it with the number of supersteps in the new TaskStatus - an 
alternative would be to set to zero the superstep count of each TaskStatus (in 
GroomServer / Peer) after each heartbeat.


Also, in your approach, you do:

  public void updateTaskStatus(TaskInProgress tip, TaskStatus status) {
    tip.updateStatus(status); // update tip
    
    superstepCounter = status.getSuperstepCount();
  }

Like this, I think the superstep count of the job will be the superstep count 
of the last completed task.
For e.g., if you have 2 tasks for a job (each one running on a different 
groom/peer), and each one has 2 supersteps done when they finish, the job's 
superstep count should be 2+2=4 right?
With that approach (your suggestion) I think it will end up as 2.
But correct me if I am wrong :)

> Add superstep counter
> ---------------------
>
>                 Key: HAMA-282
>                 URL: https://issues.apache.org/jira/browse/HAMA-282
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>            Assignee: Filipe Manana
>             Fix For: 0.2.0
>
>         Attachments: hama-282.patch, HAMA-282_v02.patch, hama-282_v03.patch, 
> hama-282_v04.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to