To prevent bad reducer merging, the reducer merging only kicks in when the optimizer thinks it gets a perf boost.
MR -> MRR is not a big win when it comes Tez, due to container-reuse - going wide on the large cardinality in case of missing map-side aggregation will be safer. If hive.map.aggr=true and the userid set fits within memory, then smushing the reducers would be nicer. To reset the wide-narrow checks, do set hive.optimize.reducededuplication.min.reducer=1; But be aware that it will fail (I¹ve seen full disks) as you scale upwards to the 10+ Tb cases. Cheers, Gopal On 4/22/15, 2:15 PM, "r7raul1...@163.com" <r7raul1...@163.com> wrote: > > >select userid,count(*) from u_data group by userid order by userid >will product MRR. > >I think when the result of userid,count(*) is small(one reduce can >process the result) . This query plan can optimize to MR ? > > > > >r7raul1...@163.com