[ https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1272: ---------------------------- Status: Patch Available (was: Reopened) > Column pruner causes wrong results > ---------------------------------- > > Key: PIG-1272 > URL: https://issues.apache.org/jira/browse/PIG-1272 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1272-1.patch, PIG-1272-2.patch > > > For a simple script the column pruner optimization removes certain columns > from the original relation, which results in wrong results. > Input file "kv" contains the following columns (tab separated) > {code} > a 1 > a 2 > a 3 > b 4 > c 5 > c 6 > b 7 > d 8 > {code} > Now running this script in Pig 0.6 produces > {code} > kv = load 'kv' as (k,v); > keys= foreach kv generate k; > keys = distinct keys; > keys = limit keys 2; > rejoin = join keys by k, kv by k; > dump rejoin; > {code} > (a,a) > (a,a) > (a,a) > (b,b) > (b,b) > Running this in Pig 0.5 version without column pruner results in: > (a,a,1) > (a,a,2) > (a,a,3) > (b,b,4) > (b,b,7) > When we disable the "ColumnPruner" optimization it gives right results. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.