[
https://issues.apache.org/jira/browse/PIG-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Travis Woodruff updated PIG-4018:
---------------------------------
Attachment: PIG-4018.patch
This is an attempt at a patch. It moves UnionOnSchemaSetter above
ImplicitSplitInsertVisitor in LogicalPlan.validate().
It also contains a small test that demonstrates the problem (apply the test but
not the LogicalPlan patch to see it fail).
This seems to work in my testing, but I'm not super familiar with the logical
plan generation process, so I'm not 100% sure about the effects here. The
operations between the old location and the new are
- {{ImplicitSplitInsertVisitor}} -- expected
- {{DuplicateForEachColumnRewriteVisitor}} -- UnionOnSchemaSetter adds a few
more FOREACHes for this to check, but doesn't seem like this should have a
noticeable impact
- {{TypeCheckingRelVisitor}} -- Again, a couple more FOREACHes to check, but
there should be no functional impact.
> Schema validation fails with UNION ONSCHEMA
> -------------------------------------------
>
> Key: PIG-4018
> URL: https://issues.apache.org/jira/browse/PIG-4018
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.13.0
> Reporter: Travis Woodruff
> Assignee: Travis Woodruff
> Attachments: PIG-4018.patch
>
>
> When relations with differing schemas are unioned (using UNION ONSCHEMA),
> schema validation can fail with this exception:
> {{org.apache.pig.impl.plan.PlanValidationException: Logical plan invalid
> state: invalid uid -1 in schema}}
> This worked before the fix for PIG-3492.
> The merged schema (from {{LOUnion.getSchema()}}) does not contain uids for
> columns not in the schema of the first input (uids are set to -1). This is
> because only the first input's schema is used for looking up "cached" uids.
> Normally, this isn't a problem because {{UnionOnSchemaSetter}} comes along
> and fixes the missing fields.
> However, when {{ImplicitSplitInsertVisitor}} is active, it is called before
> {{UnionOnSchemaSetter}}. {{ImplicitSplitInsertVisitor}} calls
> {{schemaResetter.visit()}}, which throws the validation exception because
> {{UnionOnSchemaSetter}} has not had a chance to create the missing fields
> (and this uids are still -1 for these fields).
--
This message was sent by Atlassian JIRA
(v6.2#6252)