Re: hive on tez optimize MRR to MR?

Gopal Vijayaraghavan Wed, 22 Apr 2015 02:12:11 -0700

To prevent bad reducer merging, the reducer merging only kicks in when the
optimizer thinks it gets a perf boost.

MR -> MRR is not a big win when it comes Tez, due to container-reuse -
going wide on the large cardinality in case of missing map-side
aggregation will be safer.

If hive.map.aggr=true and the userid set fits within memory, then smushing
the reducers would be nicer.

To reset the wide-narrow checks, do

set hive.optimize.reducededuplication.min.reducer=1;

But be aware that it will fail (I¹ve seen full disks) as you scale upwards
to the 10+ Tb cases.

Cheers,
Gopal

On 4/22/15, 2:15 PM, "[email protected]" <[email protected]> wrote:

>
>
>select userid,count(*) from u_data group by userid order by userid
>will product MRR.
>
>I think when the result of  userid,count(*) is small(one reduce can
>process the result) . This query plan can optimize to MR ?
>
>
>
>
>[email protected]

Re: hive on tez optimize MRR to MR?

Reply via email to