To prevent bad reducer merging, the reducer merging only kicks in when the
optimizer thinks it gets a perf boost.

MR -> MRR is not a big win when it comes Tez, due to container-reuse -
going wide on the large cardinality in case of missing map-side
aggregation will be safer.

If hive.map.aggr=true and the userid set fits within memory, then smushing
the reducers would be nicer.

To reset the wide-narrow checks, do

set hive.optimize.reducededuplication.min.reducer=1;


But be aware that it will fail (I¹ve seen full disks) as you scale upwards
to the 10+ Tb cases.

Cheers,
Gopal

On 4/22/15, 2:15 PM, "r7raul1...@163.com" <r7raul1...@163.com> wrote:

>
>
>select userid,count(*) from u_data group by userid order by userid
>will product MRR.
>
>I think when the result of  userid,count(*) is small(one reduce can
>process the result) . This query plan can optimize to MR ?
>
>
>
>
>r7raul1...@163.com


Reply via email to