Koji Noguchi created PIG-5231:
---------------------------------
Summary: PigStorage with -schema may produce inconsistent outputs
with more fields
Key: PIG-5231
URL: https://issues.apache.org/jira/browse/PIG-5231
Project: Pig
Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
When multiple directories are passed to PigStorage(',','-schema'), pig will
{quote}
No attempt to merge conflicting schemas is made during loading. The first
schema encountered during a file system scan is used.
{quote}
For two directories input with schema
file1: (f1:chararray, f2:int) and
file2: (f1:chararray, f2:int, f3:int)
Pig will pick the first schema from file1 and only allow f1, f2 access.
However, output would still contain 3 fields for tuples from file2. This later
leads to complete corrupt outputs due to shifted fields resulting in incorrect
references.
(This may also happen when input itself contains the delimiter.)
If file2 schema is picked, this is already handled by filling the missing
fields with null. (PIG-3100)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)