[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)

2011-10-13 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Attachment: MR-2905.10-13-2011

Unit test included. Unit test found typo, which was fixed (assign map assign 
reduce, whatever)

 CapBasedLoadManager incorrectly allows assignment when assignMultiple is true 
 (was: assignmultiple per job)
 ---

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean
 Attachments: MR-2905.10-13-2011, MR-2905.patch, MR-2905.patch.2


 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)

2011-10-13 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Attachment: screenshot-1.jpg

Unit test failure exposes the issue. When assignmultiple is true, a load 
manager might be asked to assign 3 maps in a loop, and it allows all of them.

 CapBasedLoadManager incorrectly allows assignment when assignMultiple is true 
 (was: assignmultiple per job)
 ---

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean
 Attachments: MR-2905.10-13-2011, MR-2905.patch, MR-2905.patch.2, 
 screenshot-1.jpg


 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)

2011-10-12 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Attachment: MR-2905.patch.2

Checked grant for inclusion
Fixed tab v. space issue per harsh

 CapBasedLoadManager incorrectly allows assignment when assignMultiple is true 
 (was: assignmultiple per job)
 ---

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean
 Attachments: MR-2905.patch, MR-2905.patch.2


 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager incorrectly allows assignment when assignMultiple is true (was: assignmultiple per job)

2011-10-12 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Issue Type: Bug  (was: Improvement)

Ok. That's different from testing for regressions but maybe I can do both. I 
have the code it's just not in junit form. Stay tuned!

 CapBasedLoadManager incorrectly allows assignment when assignMultiple is true 
 (was: assignmultiple per job)
 ---

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean
 Attachments: MR-2905.patch, MR-2905.patch.2


 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager cannot access running tasks (was: assignmultiple per job)

2011-10-05 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Attachment: MR-2905.patch

Please review and validate the approach.

The problem is that when AssignMultiple is turned on, canAssignMap and 
canAssignReduce gets called in a loop which causes the task tracker status to 
go out of date as new tasks are marked for assignment. Tasks get added to the 
list, but don't actually get assigned until AssignTasks exits. Hence, the task 
tracker status falls out of date.

Patch modifies CapBasedLoadManager to track the number of times canAssignMap 
and canAssignReduce returns true to the same task tracker. It assumes that 
when it returns true, there's potentially another running task that needs to be 
considered when deciding whether we can assign a new task to this tracker.

 CapBasedLoadManager cannot access running tasks (was: assignmultiple per job)
 -

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean
 Attachments: MR-2905.patch


 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2905) CapBasedLoadManager cannot access running tasks (was: assignmultiple per job)

2011-09-27 Thread Jeff Bean (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Bean updated MAPREDUCE-2905:
-

Summary: CapBasedLoadManager cannot access running tasks (was: 
assignmultiple per job)  (was: Allow mapred.fairscheduler.assignmultple to be 
set per job)

 CapBasedLoadManager cannot access running tasks (was: assignmultiple per job)
 -

 Key: MAPREDUCE-2905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.2
Reporter: Jeff Bean

 We encountered a situation where in the same cluster, large jobs benefit from 
 mapred.fairscheduler.assignmultiple, but small jobs with small numbers of 
 mappers do not: the mappers all clump to fully occupy just a few nodes, which 
 causes those nodes to saturate and bottleneck. The desired behavior is to 
 spread the job across more nodes so that a relatively small job doesn't 
 saturate any node in the cluster.
 Testing has shown that setting mapred.fairscheduler.assignmultiple to false 
 gives the desired behavior for small jobs, but is unnecessary for large jobs. 
 However, since this is a cluster-wide setting, we can't properly tune.
 It'd be nice if jobs can set a param similar to 
 mapred.fairscheduler.assignmultiple on submission to better control the task 
 distribution of a particular job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira