[
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-1272:
----------------------------
Status: Patch Available (was: Open)
> Column pruner causes wrong results
> ----------------------------------
>
> Key: PIG-1272
> URL: https://issues.apache.org/jira/browse/PIG-1272
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.6.0
> Reporter: Viraj Bhat
> Assignee: Daniel Dai
> Fix For: 0.7.0
>
> Attachments: PIG-1272-1.patch
>
>
> For a simple script the column pruner optimization removes certain columns
> from the original relation, which results in wrong results.
> Input file "kv" contains the following columns (tab separated)
> {code}
> a 1
> a 2
> a 3
> b 4
> c 5
> c 6
> b 7
> d 8
> {code}
> Now running this script in Pig 0.6 produces
> {code}
> kv = load 'kv' as (k,v);
> keys= foreach kv generate k;
> keys = distinct keys;
> keys = limit keys 2;
> rejoin = join keys by k, kv by k;
> dump rejoin;
> {code}
> (a,a)
> (a,a)
> (a,a)
> (b,b)
> (b,b)
> Running this in Pig 0.5 version without column pruner results in:
> (a,a,1)
> (a,a,2)
> (a,a,3)
> (b,b,4)
> (b,b,7)
> When we disable the "ColumnPruner" optimization it gives right results.
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.