determine number of reducers before MR plan optimizations are done
------------------------------------------------------------------
Key: PIG-2295
URL: https://issues.apache.org/jira/browse/PIG-2295
Project: Pig
Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Thejas M Nair
MR plan optimization rules use the requested parallelism (specified by user) in
the optimization rules. But if the user has not specified the number of
reducers, they are determined based on the input data size. But this final
number of reducers is not what the optimization rules see, and as a result the
plans are sub optimal in two cases -
1. If user has not specified parallelism, and the parallelism heuristic sets
parallelism to 1, the LimitAdjuster ends up introducing an unnecessary extra MR
job.
2. If the user has not specfied parallelism and parallelism heuristic sets
parallelism to be higher than pig.files.concatenation.threshold, the extra
concatenation job in case of FRJoin does not get added. The check is in
MRCompiler.visitFRJoin().
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira