[ https://issues.apache.org/jira/browse/PIG-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-435: ------------------------------- Fix Version/s: 0.9.0 > wrong columns produced if incomplete definition provided during load > -------------------------------------------------------------------- > > Key: PIG-435 > URL: https://issues.apache.org/jira/browse/PIG-435 > Project: Pig > Issue Type: Bug > Affects Versions: 0.2.0 > Reporter: Olga Natkovich > Assignee: Pradeep Kamath > Priority: Minor > Fix For: 0.9.0 > > > Scrip: > A = load 'studenttab10k' as (name); -- note that data has more than 1 column > B = load 'votertab10k' as (name, age, reg, contrib); > D = COGROUP A by name, B by name; > E = foreach D generate flatten(A), flatten(B); > F = foreach E generate registration, contr; > dump F; > The dump produces the wrong columns. This is because even though we declared > only one column, we actually load all columns of A. So any place where we > explicitely or implicitely use A.* as the case in flatten, we would produce > the wrong results. > The long term solution is actually to push projections into the load. Shorter > term the proposal is to notice if the script uses A.* and stick a project > after the load. Note that we don't need to do that if types are declared > because there will be already casting foreach there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.