[ 
https://issues.apache.org/jira/browse/PIG-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4377:
----------------------------
    Attachment: PIG-4377-1.patch

Attach a fix.

Here is what happens:
1. Certain key x is sampled (by PoissonSampleLoader/PartitionSkewedKeys) to 
have y reduces
2. Actually, only y1 < y records carry key x
3. There are reduce which suppose to get key x does not get row with key x
4. The reduce does not get x will generate redundant empty left relation 
(CompilerUtils.addEmptyBagOuterJoin)

What the patch does is:
Only generate empty left relation in the first reduce of key x

> Skewed outer join produce wrong result in some cases
> ----------------------------------------------------
>
>                 Key: PIG-4377
>                 URL: https://issues.apache.org/jira/browse/PIG-4377
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.15.0
>
>         Attachments: PIG-4377-1.patch, reproduce.patch
>
>
> Skewed outer join produce more than expected rows under certain condition. 
> The extra rows contain null left relation. Can be reproduced reliably with 
> reproduce.patch (run SkewedJoin_11).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to