[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-10-07 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919116#action_12919116
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1463:
--

on a somewhat different note: i frequently see reducers not being scheduled (to 
wait for map completions) even when the cluster has tons of idle reduce slots. 
that makes no sense (especially when pre-emption is enabled).  that seems to 
suggest that some of the heuristics should take cluster load into account.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-03-31 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851983#action_12851983
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

I think improving the timing for launching reducers is not just for small jobs.
In the case of FairSchduler, for larger jobs with 1+ mappers, the mappers 
needs several batches to be fully scheduled.
In this case if we launch the reducer when 5% mapper finished, those reducers 
will just be idling.

Here is the trade-off.
If we launch the reducer too late, we lose the parallel execution for the 
mapper execution and reducer shuffling.
But if we launch the reducer too early, we waste the reducer slots because they 
have to wait the mappers to finish.

The optimal case for this is that we launch the reducers as late as possible 
while the reducer shuffling phase finishes right after the last mapper finished.

The goal is to somehow estimate the mapper finish time based on the information 
we have and launch the reducers at the right moment.
I think this decision should depend on TaskScheduler because different 
scheduling policy affects the mapper finish time.

Thoughts?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835041#action_12835041
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: That is a good point. If the reducer can later on catch up with the 
mapper, then there is no harm for the delay at the beginning. 

@Arun: I know you don't like this because it increases the complexity. Will you 
feel more comfortable if we move this just inside fairscheduler?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835046#action_12835046
 ] 

dhruba borthakur commented on MAPREDUCE-1463:
-

@Arun: are you suggesting that the job submission process first generate the 
input splits, then determines if the number of map tasks is smaller than a 
certain value, and if so then set mapreduce.job.reduce.slowstart.completedmaps 
to zero?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835076#action_12835076
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

Actually, assuming we have a reasonable model, I'd do it on the JobTracker, 
maybe in JobInProgress.initTasks i.e. during job-initialization.



 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-16 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834371#action_12834371
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Amar: Sorry for the late reply. I have just got back from vacation. About your 
long running mapper argument I think you are right. Using task counts is not 
sufficient. Maybe we need more information than task counts to determine when 
to delay the reducers. Can you give me some suggestions? Setting 
mapreduce.job.reduce.slowstart.completedmaps to zero does increase the latency. 
But it hurts the reducer utilization.

I think the trade-off here is that we want to delay the reducers to increase 
the reducer utilization but we also want to minimize the impact of this delay 
for smaller jobs because this delay is significant for smaller jobs but is OK 
for large jobs. So these two cases should be treated differently. There should 
be a way to balance the reducer utilization and small job latency, thoughts?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834435#action_12834435
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Stepping back a bit to think about the model. Correct me if you disagree:

Our end goal is that the reducers finish fetching reduce output as soon as 
possible after the last mapper finishes, but that the reducers are started as 
late as possible, so they don't occupy slots and hurt utilization.

So, let's assume that the mappers generate data at some rate M. The reducers 
can fetch data at some maximum rate R. It's the ratio of M/R that determines 
when to start the reducers fetching. For example, if the mappers generate data 
faster than the reducers can fetch it, it behooves us to start the reduce fetch 
immediately when the job starts. If the reducers can fetch twice as fast as the 
mappers can output, we want to start the reducers halfway through the map phase.

Since both kinds of tasks have some kind of startup cost, it's as if the 
average rate is slowed down by a factor that's determined by the number of 
tasks. In the case of 200 mappers and 1 reducers, it's as if the map output 
speed has been lowered (since the fixed costs of the map tasks slow down map 
completion), and thus we can afford to wait until later to start the reducer. 
If you have 1 mapper and 1 reducer, even for the exact same job, the ratio 
swings as if the map side output faster, and thus we want to start the reduce 
early.

This is of course a much simplified model, but I think it's worth discussing 
this on somewhat abstract terms before we discuss the implementation details. 
One factor I'm ignoring above is the limiting that the reducer does with 
respect to particular hosts - that is to say,  the reducer fetch speed varies 
with the number of unique hosts, not just the number of mappers.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831577#action_12831577
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

-1

These knobs seem backwards - as both Todd and Amar have pointed out we could 
add heuristics to tweak mapreduce.job.reduce.slowstart.completedmaps 
automatically without adding more config knobs.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831797#action_12831797
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: 
Yes, you're right. The logic in the patch is wrong. The one you post is the 
correct logic. Sorry about the mistake.

@Amar: 
{quote}
How do you define small jobs. Shouldnt it be based on total number of tasks 
instead of considering maps and reduces individually?
{quote}
We want to start reducer faster in both the fewer mapper and fewer reducer 
cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And 
for fewer mapper case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is 
simpler at least). 
{quote}
Why do we need special case for small jobs? If its for fairness then this piece 
of code rightly belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t 
small jobs?
{quote}
Handling the special case for small jobs increase the overall latency which 
gives the users better experience.
{quote}
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
{quote}
For experienced users,  setting completedmaps=0 does fix this problem. But it 
will be nice if this can be automatically done for other users who do not know 
how to configure hadoop.


@Arun: 
Thanks for the comments. I agree. Tweaking 
mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a 
cleaner way for this one. For experienced users, settting completedmaps to 0 in 
the client side will make their small jobs finish faster.  But it would be nice 
if some automatic decision can be done here such that the normal users don't 
have to learn how to configure an extra parameter.


The point here is that for some cases (small job, small number of mappers or 
reducers) we should not be spending time on waiting the reducers to start 
because the waiting time is significant (or it is cheap to start the reducer 
earlier). Automatically reducing the latency makes our user happy.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831894#action_12831894
 ] 

Amar Kamat commented on MAPREDUCE-1463:
---

What should be the behavior where total number of maps and reducers are less 
(i.e a small job for now) but takes huge amount of time to finish. For example 
the map takes a day to run while the reduces are also compute intensive. In 
such a case would we still consider the job as small job? I think what we want 
to capture is the job behavior (fast *finishing* job versus others). Using task 
counts might not be sufficient. 

Scott, wouldn't this problem be solved if you set 
'mapreduce.job.reduce.slowstart.completedmaps' to a default value of 0 (instead 
of 0.5) for all your users? 

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-06 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830502#action_12830502
 ] 

Amar Kamat commented on MAPREDUCE-1463:
---

Scott, 
- How do you define small jobs. Shouldnt it be based on total number of tasks 
instead of considering maps and reduces individually? 
- Why do we need special case for small jobs? If its for fairness then this 
piece of code rightly belongs to contrib/fairscheduler, no?
- If not for fairness then what is the problem with the current framework w.r.t 
small jobs?
- Can be fixed by simple (configuration-like) tweaking?
- If not then whats the right fix. 

Wouldn't the reducers be scheduled faster if 
'mapreduce.job.reduce.slowstart.completedmaps' is set to 0? If not then can we 
change the slowstart feature to get it right?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830332#action_12830332
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1463:
---

How would you define small?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen

 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830338#action_12830338
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Is this basically about changing the slowstart to be nonlinear? ie instead of 
just start reducers when x% of maps are complete, factor in the total number 
of maps in the job as well?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen

 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830379#action_12830379
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Why not integrate this directly into JobInProgress.scheduleReduces() rather 
than in the fairscheduler? This should be a generally useful feature.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830380#action_12830380
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: Yes, that's a great idea. And this should logically be in 
scheduleReduces(). I will repost the patch soon.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830381#action_12830381
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Cool. The other thing I noticed is that the new configurations should be 
documented in mapred-site.xml (since you're moving to mapred proper)

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830403#action_12830403
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

Followed Todd's suggestion to integrate this in JobInProgress.scheduleReduces() 
and up date the patch.
I will do the documentation and unit test soon.
Thanks for the suggestions.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830406#action_12830406
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


I think the transferred logic is wrong. Shouldn't it be:
{code}
return numMapTasks = reduceRushMapsThreshold ||
numReduceTasks = reduceRushReducesThreshold ||
finishedMapTasks = completedMapsForReduceSlowstart;
{code}

Also, I'm not sure that the design is quite right. If I have 1 map but 200 
reduces, I don't want to rush the reduces, do I? That is to say, should the 
condition be  between the two rush parameters, or ||?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.