The first job simply read input and dump to hdfs. The need for first job is:
1. SampleLoader does not work with non-hdfs loader
2. SampleLoader does not process any operators before "order by"

In some cases the first job can be optimized out, see
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer

Thanks,
Daniel

On Thu, May 22, 2014 at 8:04 PM, Ruoyu Liu <[email protected]> wrote:
> However, if the operation is order by multiple keys, there will be three 
> jobs. The second and third job are similar to the two jobs
> when order by 1 key. Can anyone point out what will the first map-only job do?
> Also can anyone point me to the right place to figure out various Pig 
> operation execution details?
>
> Thanks!!
> Ruoyu
>
> On May 22, 2014, at 1:32 PM, Rohini Palaniswamy <[email protected]> 
> wrote:
>
>> If there is just one reducer there is no need for sampling (PIG-2784), but
>> when there is more than one reducer in order by you need to sample the data
>> and determine the partition ranges so that you can do a Distributed Orderby.
>>
>> Regards,
>> Rohini
>>
>>
>> On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> I’m looking at the execution process of several operations and have a
>>> question may be naive and hope that someone can help me.
>>> For the operations like Ordey by, why do we use an extra MR job to sample
>>> the data? But in java version implementation, we can always use on MR job
>>> to implement the operation.
>>>
>>> Thank you for your time!!
>>>
>>> Best,
>>> Ruoyu
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to