[ 
https://issues.apache.org/jira/browse/PIG-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948023#comment-13948023
 ] 

Rohini Palaniswamy edited comment on PIG-3743 at 3/26/14 4:05 PM:
------------------------------------------------------------------

[~cheolsoo],
 
POLocalRearrangeTez.java    
{code}
if (isUnion) {
                        // Use the entire tuple as both key and value
                        key = 
HDataType.getWritableComparableTypes(result.get(1), keyType);
                        val = new NullableTuple((Tuple)result.get(1));
                    }
{code}

  This seems very inefficient as map output size just doubles and the key is 
just discarded. Can we use POValueOutputTez instead of POLocalRearrangeTez and 
use RoundRobinPartitioner instead.  Till now POValueOutputTez was only used 
with unordered input. Since the key is always POValueOutputTez.EmptyWritable, 
can you change it to implement WritableComparable and always return 1 (or -1 
based on how comparator is called in reducer) so that no records are equal and 
they are not grouped together. If every key saying greater than other causes 
confusion while ordering and will not work, then we cannot use 
WritableComparable and have to go with each reducer having 1KV pair with all 
records grouped together. It might be overhead, but still should be better than 
having duplicating value in key and doubling the memory requirements.

Also will have to cleanup isUnion code in POLocalRearrangeTez if the solution 
works fine.


was (Author: rohini):
[~cheolsoo],
 
POLocalRearrangeTez.java    
{code}
if (isUnion) {
                        // Use the entire tuple as both key and value
                        key = 
HDataType.getWritableComparableTypes(result.get(1), keyType);
                        val = new NullableTuple((Tuple)result.get(1));
                    }
{code}

  This seems very inefficient as map output size just doubles and the key is 
just discarded. Can we use POValueOutputTez instead of POLocalRearrangeTez and 
use RoundRobinPartitioner instead.

> Use VertexGroup and Alias vertex for union
> ------------------------------------------
>
>                 Key: PIG-3743
>                 URL: https://issues.apache.org/jira/browse/PIG-3743
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>
>         Attachments: PIG-3743-1.patch, PIG-3743-2.patch, 
> PIG-3743-fix-skew.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to