Prioritize hadoop parameter "mapred.reduce.task" above estimation of reducer number -----------------------------------------------------------------------------------
Key: PIG-1810 URL: https://issues.apache.org/jira/browse/PIG-1810 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.9.0 Anup Point this problem in PIG-1249 {quote} Anup added a comment - 18/Jan/11 07:46 PM one thing that we didn't take care is the use of the hadoop parameter "mapred.reduce.tasks". If I specify the hadoop parameter -Dmapred.reduce.tasks=450 for all the MR jobs , it is overwritten by estimateNumberOfReducers(conf,mro), which in my case is 15. I am not specifying any default_parallel and PARALLEL statements. Ideally, the number of reducer should be 450. I think we should prioritize this parameter above the estimate reducers calculations. The priority list should be 1. PARALLEL statement 2. default_parallel statement 3. mapred.reduce.task hadoop parameter 4. estimateNumberOfreducers(); {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.