[
https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542122
]
amareshwari edited comment on HADOOP-1900 at 11/13/07 3:57 AM:
----------------------------------------------------------------------------
With the patch attached, I ran sort benchmarks on 390 node cluster and 120 node
cluster. The performance is almost the same as with the trunk.
To simulate busyness at the job tracker, I ran the sort benchmarks on 120 node
cluster with number of handlers=4 and with max queue size per handler =10, but
there are drops and lost task trackers with the patch and without.
Thus Cluster size factor as (clusterSize/50+1) is fine. but the busy factor has
to be tuned more.
I propose the following to tune busy factor:
We have threshouldDropCount = clusterSize/10;
We increment busyFactor by HEARTBEAT_BUSY_FACTOR (say 2secs) for 10% cluster
size drops.
if(dropCount > threshouldDropCount) {
busyFactor += (dropCount/threshouldDropCount)*HEARTBEAT_BUSY_FACTOR;
}
if job tracker is not busy for '_notBusyPeriod_' , then we will decrement
busyFactor by HEARTBEAT_BUSY_FACTOR;
We have, 2 rpcs to be processed as at the jobtracker i.e. heartbeat and task
completion events. let processing time for rpc be 2 seconds.
Here, notBusyPeriod is calculated as:
notBusyPeriod = (clusterSize/#handlers)*processingTime*2;
To stabilize,
consider we have drops at t, and we increment heartbeat interval by
busyfactor,_b_. And we notice that _b_ is small enough to get decremented and
we dont see drops at the new interval, then we stabilize.
Thoughts?
was (Author: amareshwari):
With the patch attached, I ran sort benchmarks on 390 node cluster and 120 node
cluster. The performance is almost the same as with the trunk.
To simulate busyness at the job tracker, I ran the sort benchmarks on 120 node
cluster with number of handlers=4 and with max queue size per handler =10, but
there are drops and lost task trackers with the patch and wothout.
Thus Cluster size factor as (clusterSize/50+1) is fine. but the busy factor has
to be tuned more.
I propose the following to tune busy factor:
threshouldDropCount = clusterSize/10;
if(dropCount > threshouldDropCount) {
busyFactor += (dropCount/threshouldDropCount)*HEARTBEAT_BUSY_FACTOR;
}
if job tracker is not busy for '_notBusyPeriod_' , then we will decrement
busyFactor by HEARTBEAT_BUSY_FACTOR;
We have, 2 rpcs to be processed as at the jobtracker i.e. heartbeat and task
completion events. let processing time for rpc be 2 seconds.
Here, notBusyPeriod is calculated as:
notBusyPeriod = (clusterSize/#handlers)*processingTime*2;
To stabilize,
consider we have drops at t, and we increment heartbeat interval by
busyfactor,_b_. And we notice that _b_ is small enough and we dont see drops
at the new interval, then we stabilize.
Thoughts?
> the heartbeat and task event queries interval should be set dynamically by
> the JobTracker
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-1900
> URL: https://issues.apache.org/jira/browse/HADOOP-1900
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sri Ramadasu
> Attachments: patch-1900.txt, patch-1900.txt
>
>
> The JobTracker should scale the intervals that the TaskTrackers use to
> contact it dynamically, based on how the busy it is and the size of the
> cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.