If there is just one reducer there is no need for sampling (PIG-2784), but
when there is more than one reducer in order by you need to sample the data
and determine the partition ranges so that you can do a Distributed Orderby.

Regards,
Rohini


On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote:

> Hi all,
>
> I’m looking at the execution process of several operations and have a
> question may be naive and hope that someone can help me.
> For the operations like Ordey by, why do we use an extra MR job to sample
> the data? But in java version implementation, we can always use on MR job
> to implement the operation.
>
> Thank you for your time!!
>
> Best,
> Ruoyu

Reply via email to