[jira] [Commented] (MAPREDUCE-4381) Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable

Shrinivas Joshi (JIRA) Mon, 02 Jul 2012 15:41:01 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405403#comment-13405403
 ]


Shrinivas Joshi commented on MAPREDUCE-4381:
--------------------------------------------

Thanks for the code review.
I agree about the possibility of creating scalability issues. Setting progress 
interval to a very small value may lead to excessive status update events. Can 
we address this by setting a lower bound requirement on the value of progress 
interval that the user can set? If so, how does 500 milliseconds sound as the 
lower bound?
I will address your comments in the 1st and 2nd bullet above in the revised 
version of this patch along with other changes. 
As you may have seen I  have included a short description of the new property 
in src/mapred/mapred-default.xml file. Is there any other more appropriate 
file/location where this needs to be documented?
Since this patch only makes progress_interval a tunable, would it suffice to 
test whether the value returned by JobConf matches the one set in 
mapred-site.xml?

                
> Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4381
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task, tasktracker
>            Reporter: Shrinivas Joshi
>            Priority: Minor
>         Attachments: progress_interval.patch
>
>
> Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We 
> tried making it a tunable and experimented with different values. In some 
> cases setting it to a smaller value like 1000 msec helps significantly 
> improve performance of short running jobs such as piEstimator. This is 
> because the task threads do not end up blocking for as many as 3 seconds for 
> their last progress update event. We also noticed close to 14% improvement on 
> Mahout KMeans iteration jobs which take more than 5 minutes on the test 
> cluster that we are using. Please let me know if this seems to be a good 
> idea. I have an initial patch that I have attached here. This is based on 
> branch-1 tree. It may need some rework on MRv2 based branches I think. Also 
> note that I have not changed the variable naming style for PROGRESS_INTERVAL 
> even though it is not a public static final anymore. I can revise the patch 
> if there are no objections to this idea. 
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4381) Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable

Reply via email to