[jira] [Commented] (MAPREDUCE-2636) Scheduling over disks horizontally

2012-12-21 Thread Qinghe Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537755#comment-13537755
 ] 

Qinghe Jin commented on MAPREDUCE-2636:
---

Hi Steve, although the number of disks may several times than the number of 
nodes, but I think there is only several bits different to identify it.Does it 
really matter that much? 

It's good idea to consider output and itermediate data, but do we need to think 
about it for each task? I think the best configuration is to ensure the 
locality of each task, which means it reads, writes to the same disk. In this 
way, it makes more sense to the sheduler or user. 

Conflict detection is necessary.If we rush to assign task to the busy nodes, 
it's not only more harmful to the running tasks, but also will cause load 
unblance problem. For conflict detection, there are two ways:1, find out how 
many task running on the node; 2,monitor the actual usage of different 
resources(for disk, we can use disk waiting time). I prefer the second method 
for there maybe more than one hadoop deployment.



> Scheduling over disks horizontally
> --
>
> Key: MAPREDUCE-2636
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2636
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Reporter: Evert Lammerts
>Priority: Minor
>
> Based on this message: 
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201106.mbox/browser
> The JT schedules tasks on nodes based on metadata it gets from the NN. The 
> namenode does not know on which disk a block resides. It might happen that on 
> a node running 4 tasks, all read from the same disk. This can affect 
> performance.
> An optimization might be to schedule horizontally over disks instead of 
> nodes. Any ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-295) Jobtracker leaves tasktrackers underutilized

2012-12-13 Thread Qinghe Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qinghe Jin updated MAPREDUCE-295:
-

Description: For some workloads, the jobtracker doesn't keep all the slots 
utilized even under heavy load.  (was: For some workloads, the jobtracker 
doesn't keep all the slots utilized even under heavy load.l;)

> Jobtracker leaves tasktrackers underutilized
> 
>
> Key: MAPREDUCE-295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-295
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux 
> boxes, 100 node cluster
>Reporter: Khaled Elmeleegy
> Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, 
> hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker20.patch, 
> jobtracker.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even 
> under heavy load.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-295) Jobtracker leaves tasktrackers underutilized

2012-12-13 Thread Qinghe Jin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qinghe Jin updated MAPREDUCE-295:
-

Description: For some workloads, the jobtracker doesn't keep all the slots 
utilized even under heavy load.l;  (was: For some workloads, the jobtracker 
doesn't keep all the slots utilized even under heavy load.)

> Jobtracker leaves tasktrackers underutilized
> 
>
> Key: MAPREDUCE-295
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-295
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux 
> boxes, 100 node cluster
>Reporter: Khaled Elmeleegy
> Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, 
> hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker20.patch, 
> jobtracker.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even 
> under heavy load.l;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira