[ 
https://issues.apache.org/jira/browse/PIG-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392540#comment-15392540
 ] 

Rohini Palaniswamy commented on PIG-4958:
-----------------------------------------

The above approach in the patch which makes a DAGClient call from the task 
requires getting a RM token and passing to the job. Talked with [~jlowe] and 
obviously he doesn't like the idea of talking to RM from task.

A task in the target vertex needs to get the OUTPUT_BYTES counter of all the 
input vertices. 
Problem 1 - Get counter value
Problem 2 - Pass it to the task

I had already looked at other options. There does not seem to be a good way to 
do it with VertexManagerPlugin. There is no API to get counters or to send 
events to another VertexManagerPlugin class in the AM. 

[~bikassaha]/[~hitesh]/[~sseth],
    Is there any other cleaner and simpler way to do it and avoid 
DAGClientImplRPC? 
      

> Tez autoparallelism estimation for order by is higher than mapreduce
> --------------------------------------------------------------------
>
>                 Key: PIG-4958
>                 URL: https://issues.apache.org/jira/browse/PIG-4958
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.17.0
>
>         Attachments: PIG-4958-withoutsecurity.patch
>
>
>   The input size is calculated from the size of the samples in memory. Size 
> in memory is usually 4x or more than the serialized size. Mapreduce estimates 
> the number of reducers based on serialized size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to