determine number of reducers before MR plan optimizations are done
------------------------------------------------------------------

                 Key: PIG-2295
                 URL: https://issues.apache.org/jira/browse/PIG-2295
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.9.0
            Reporter: Thejas M Nair


MR plan optimization rules use the requested parallelism (specified by user) in 
the optimization rules. But if the user has not specified the number of 
reducers, they are determined based on the input data size. But this final 
number of reducers is not what the optimization rules see, and as a result the 
plans are sub optimal in two cases -
1. If user has not specified parallelism, and the parallelism heuristic sets 
parallelism to 1, the LimitAdjuster ends up introducing an unnecessary extra MR 
job.

2. If the user has not specfied parallelism and parallelism heuristic sets 
parallelism to be higher than pig.files.concatenation.threshold, the extra 
concatenation job in case of FRJoin does not get added. The check is in 
MRCompiler.visitFRJoin().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to