Thanks!!

Best,
Ruoyu

On May 23, 2014, at 1:11 PM, Daniel Dai <[email protected]> wrote:

> The first job simply read input and dump to hdfs. The need for first job is:
> 1. SampleLoader does not work with non-hdfs loader
> 2. SampleLoader does not process any operators before "order by"
> 
> In some cases the first job can be optimized out, see
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer
> 
> Thanks,
> Daniel
> 
> On Thu, May 22, 2014 at 8:04 PM, Ruoyu Liu <[email protected]> wrote:
>> However, if the operation is order by multiple keys, there will be three 
>> jobs. The second and third job are similar to the two jobs
>> when order by 1 key. Can anyone point out what will the first map-only job 
>> do?
>> Also can anyone point me to the right place to figure out various Pig 
>> operation execution details?
>> 
>> Thanks!!
>> Ruoyu
>> 
>> On May 22, 2014, at 1:32 PM, Rohini Palaniswamy <[email protected]> 
>> wrote:
>> 
>>> If there is just one reducer there is no need for sampling (PIG-2784), but
>>> when there is more than one reducer in order by you need to sample the data
>>> and determine the partition ranges so that you can do a Distributed Orderby.
>>> 
>>> Regards,
>>> Rohini
>>> 
>>> 
>>> On Thu, May 22, 2014 at 10:37 AM, Ruoyu Liu <[email protected]> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I’m looking at the execution process of several operations and have a
>>>> question may be naive and hope that someone can help me.
>>>> For the operations like Ordey by, why do we use an extra MR job to sample
>>>> the data? But in java version implementation, we can always use on MR job
>>>> to implement the operation.
>>>> 
>>>> Thank you for your time!!
>>>> 
>>>> Best,
>>>> Ruoyu
>> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.
> 

Reply via email to