[jira] [Commented] (PIG-4958) Tez autoparallelism estimation for order by is higher than mapreduce

Bikas Saha (JIRA) Mon, 25 Jul 2016 13:21:46 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392591#comment-15392591
 ]


Bikas Saha commented on PIG-4958:
---------------------------------

Also this might overload the RM in case there are many such tasks. Hitesh did 
some work to fetch and pass the client tokens to the AM so that the AM could 
use client rm protocol internally to get cluster node info. So AM's could 
potentially push that to the tasks.

Separately, we could consider sending a processor payload update event (create 
one if it does not exist) to pass consolidated stats information from the 
vertex manager to all the tasks. This could enable many other scenarios where 
udpated info needs to be sent to tasks.

> Tez autoparallelism estimation for order by is higher than mapreduce
> --------------------------------------------------------------------
>
>                 Key: PIG-4958
>                 URL: https://issues.apache.org/jira/browse/PIG-4958
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-4958-withoutsecurity.patch
>
>
>   The input size is calculated from the size of the samples in memory. Size 
> in memory is usually 4x or more than the serialized size. Mapreduce estimates 
> the number of reducers based on serialized size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4958) Tez autoparallelism estimation for order by is higher than mapreduce

Reply via email to