subject:"\[jira\] Commented\: \(MAPREDUCE\-1463\) Reducer should start faster for smaller jobs"

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-10-07 Thread Joydeep Sen Sarma (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919116#action_12919116
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1463:
--

on a somewhat different note: i frequently see reducers not being scheduled (to 
wait for map completions) even when the cluster has tons of idle reduce slots. 
that makes no sense (especially when pre-emption is enabled).  that seems to 
suggest that some of the heuristics should take cluster load into account.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-03-31 Thread Scott Chen (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851983#action_12851983
]

Scott Chen commented on MAPREDUCE-1463:
---

I think improving the timing for launching reducers is not just for small jobs.
In the case of FairSchduler, for larger jobs with 1+ mappers, the mappers
needs several batches to be fully scheduled.
In this case if we launch the reducer when 5% mapper finished, those reducers
will just be idling.

Here is the trade-off.
If we launch the reducer too late, we lose the parallel execution for the
mapper execution and reducer shuffling.
But if we launch the reducer too early, we waste the reducer slots because they
have to wait the mappers to finish.

The optimal case for this is that we launch the reducers as late as possible
while the reducer shuffling phase finishes right after the last mapper finished.

The goal is to somehow estimate the mapper finish time based on the information
we have and launch the reducers at the right moment.
I think this decision should depend on TaskScheduler because different
scheduling policy affects the mapper finish time.

Thoughts?

Reducer should start faster for smaller jobs

Key: MAPREDUCE-1463
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch,
MAPREDUCE-1463-v3.patch

Our users often complain about the slowness of smaller ad-hoc jobs.
The overhead to wait for the reducers to start in this case is significant.
It will be good if we can start the reducer sooner in this case.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835041#action_12835041
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: That is a good point. If the reducer can later on catch up with the 
mapper, then there is no harm for the delay at the beginning. 

@Arun: I know you don't like this because it increases the complexity. Will you 
feel more comfortable if we move this just inside fairscheduler?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835046#action_12835046
 ] 

dhruba borthakur commented on MAPREDUCE-1463:
-

@Arun: are you suggesting that the job submission process first generate the 
input splits, then determines if the number of map tasks is smaller than a 
certain value, and if so then set mapreduce.job.reduce.slowstart.completedmaps 
to zero?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-17 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835076#action_12835076
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

Actually, assuming we have a reasonable model, I'd do it on the JobTracker, 
maybe in JobInProgress.initTasks i.e. during job-initialization.



 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-16 Thread Scott Chen (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834371#action_12834371
]

Scott Chen commented on MAPREDUCE-1463:
---

@Amar: Sorry for the late reply. I have just got back from vacation. About your
long running mapper argument I think you are right. Using task counts is not
sufficient. Maybe we need more information than task counts to determine when
to delay the reducers. Can you give me some suggestions? Setting
mapreduce.job.reduce.slowstart.completedmaps to zero does increase the latency.
But it hurts the reducer utilization.

I think the trade-off here is that we want to delay the reducers to increase
the reducer utilization but we also want to minimize the impact of this delay
for smaller jobs because this delay is significant for smaller jobs but is OK
for large jobs. So these two cases should be treated differently. There should
be a way to balance the reducer utilization and small job latency, thoughts?

Reducer should start faster for smaller jobs

Key: MAPREDUCE-1463
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch,
MAPREDUCE-1463-v3.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-16 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834435#action_12834435
]

Todd Lipcon commented on MAPREDUCE-1463:

Stepping back a bit to think about the model. Correct me if you disagree:

Our end goal is that the reducers finish fetching reduce output as soon as
possible after the last mapper finishes, but that the reducers are started as
late as possible, so they don't occupy slots and hurt utilization.

So, let's assume that the mappers generate data at some rate M. The reducers
can fetch data at some maximum rate R. It's the ratio of M/R that determines
when to start the reducers fetching. For example, if the mappers generate data
faster than the reducers can fetch it, it behooves us to start the reduce fetch
immediately when the job starts. If the reducers can fetch twice as fast as the
mappers can output, we want to start the reducers halfway through the map phase.

Since both kinds of tasks have some kind of startup cost, it's as if the
average rate is slowed down by a factor that's determined by the number of
tasks. In the case of 200 mappers and 1 reducers, it's as if the map output
speed has been lowered (since the fixed costs of the map tasks slow down map
completion), and thus we can afford to wait until later to start the reducer.
If you have 1 mapper and 1 reducer, even for the exact same job, the ratio
swings as if the map side output faster, and thus we want to start the reduce
early.

This is of course a much simplified model, but I think it's worth discussing
this on somewhat abstract terms before we discuss the implementation details.
One factor I'm ignoring above is the limiting that the reducer does with
respect to particular hosts - that is to say, the reducer fetch speed varies
with the number of unique hosts, not just the number of mappers.

Reducer should start faster for smaller jobs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831577#action_12831577
 ] 

Arun C Murthy commented on MAPREDUCE-1463:
--

-1

These knobs seem backwards - as both Todd and Amar have pointed out we could 
add heuristics to tweak mapreduce.job.reduce.slowstart.completedmaps 
automatically without adding more config knobs.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Scott Chen (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831797#action_12831797
]

Scott Chen commented on MAPREDUCE-1463:
---

@Todd:
Yes, you're right. The logic in the patch is wrong. The one you post is the
correct logic. Sorry about the mistake.

@Amar:
{quote}
How do you define small jobs. Shouldnt it be based on total number of tasks
instead of considering maps and reduces individually?
{quote}
We want to start reducer faster in both the fewer mapper and fewer reducer
cases.
Because for fewer reducer case, starting reducer earlier is cheap anyway. And
for fewer mapper case, the mapper finishes faster.
But I think it may not be a bad idea if we take the total instead (it is
simpler at least).
{quote}
Why do we need special case for small jobs? If its for fairness then this piece
of code rightly belongs to contrib/fairscheduler, no?
If not for fairness then what is the problem with the current framework w.r.t
small jobs?
{quote}
Handling the special case for small jobs increase the overall latency which
gives the users better experience.
{quote}
Can be fixed by simple (configuration-like) tweaking?
If not then whats the right fix.
{quote}
For experienced users, setting completedmaps=0 does fix this problem. But it
will be nice if this can be automatically done for other users who do not know
how to configure hadoop.

@Arun:
Thanks for the comments. I agree. Tweaking
mapreduce.job.reduce.slowstart.completedmaps in the job client side should be a
cleaner way for this one. For experienced users, settting completedmaps to 0 in
the client side will make their small jobs finish faster. But it would be nice
if some automatic decision can be done here such that the normal users don't
have to learn how to configure an extra parameter.

The point here is that for some cases (small job, small number of mappers or
reducers) we should not be spending time on waiting the reducers to start
because the waiting time is significant (or it is cheap to start the reducer
earlier). Automatically reducing the latency makes our user happy.

Reducer should start faster for smaller jobs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-09 Thread Amar Kamat (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831894#action_12831894
 ] 

Amar Kamat commented on MAPREDUCE-1463:
---

What should be the behavior where total number of maps and reducers are less 
(i.e a small job for now) but takes huge amount of time to finish. For example 
the map takes a day to run while the reduces are also compute intensive. In 
such a case would we still consider the job as small job? I think what we want 
to capture is the job behavior (fast *finishing* job versus others). Using task 
counts might not be sufficient. 

Scott, wouldn't this problem be solved if you set 
'mapreduce.job.reduce.slowstart.completedmaps' to a default value of 0 (instead 
of 0.5) for all your users? 

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch, 
 MAPREDUCE-1463-v3.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-06 Thread Amar Kamat (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830502#action_12830502
]

Amar Kamat commented on MAPREDUCE-1463:
---

Scott,
- How do you define small jobs. Shouldnt it be based on total number of tasks
instead of considering maps and reduces individually?
- Why do we need special case for small jobs? If its for fairness then this
piece of code rightly belongs to contrib/fairscheduler, no?
- If not for fairness then what is the problem with the current framework w.r.t
small jobs?
- Can be fixed by simple (configuration-like) tweaking?
- If not then whats the right fix.

Wouldn't the reducers be scheduled faster if
'mapreduce.job.reduce.slowstart.completedmaps' is set to 0? If not then can we
change the slowstart feature to get it right?

Reducer should start faster for smaller jobs

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830332#action_12830332
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1463:
---

How would you define small?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen

 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830338#action_12830338
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Is this basically about changing the slowstart to be nonlinear? ie instead of 
just start reducers when x% of maps are complete, factor in the total number 
of maps in the job as well?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen

 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830379#action_12830379
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Why not integrate this directly into JobInProgress.scheduleReduces() rather 
than in the fairscheduler? This should be a generally useful feature.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830380#action_12830380
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

@Todd: Yes, that's a great idea. And this should logically be in 
scheduleReduces(). I will repost the patch soon.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830381#action_12830381
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


Cool. The other thing I noticed is that the new configurations should be 
documented in mapred-site.xml (since you're moving to mapred proper)

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830403#action_12830403
 ] 

Scott Chen commented on MAPREDUCE-1463:
---

Followed Todd's suggestion to integrate this in JobInProgress.scheduleReduces() 
and up date the patch.
I will do the documentation and unit test soon.
Thanks for the suggestions.

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

2010-02-05 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830406#action_12830406
 ] 

Todd Lipcon commented on MAPREDUCE-1463:


I think the transferred logic is wrong. Shouldn't it be:
{code}
return numMapTasks = reduceRushMapsThreshold ||
numReduceTasks = reduceRushReducesThreshold ||
finishedMapTasks = completedMapsForReduceSlowstart;
{code}

Also, I'm not sure that the design is quite right. If I have 1 map but 200 
reduces, I don't want to rush the reduces, do I? That is to say, should the 
condition be  between the two rush parameters, or ||?

 Reducer should start faster for smaller jobs
 

 Key: MAPREDUCE-1463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1463
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: MAPREDUCE-1463-v1.patch, MAPREDUCE-1463-v2.patch


 Our users often complain about the slowness of smaller ad-hoc jobs.
 The overhead to wait for the reducers to start in this case is significant.
 It will be good if we can start the reducer sooner in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

[jira] Commented: (MAPREDUCE-1463) Reducer should start faster for smaller jobs

18 matches

Site Navigation

Mail list logo

Footer information