Koji Noguchi created PIG-5231:
---------------------------------

             Summary: PigStorage with -schema may produce inconsistent outputs 
with more fields
                 Key: PIG-5231
                 URL: https://issues.apache.org/jira/browse/PIG-5231
             Project: Pig
          Issue Type: Bug
            Reporter: Koji Noguchi
            Assignee: Koji Noguchi
            Priority: Minor


When multiple directories are passed to PigStorage(',','-schema'), pig will 
{quote}
No attempt to merge conflicting schemas is made during loading. The first 
schema encountered during a file system scan is used.
{quote}
For two directories input with schema
file1: (f1:chararray, f2:int) and 
file2: (f1:chararray, f2:int, f3:int) 

Pig will pick the first schema from file1 and only allow f1, f2 access. 
However, output would still contain 3 fields for tuples from file2.  This later 
leads to complete corrupt outputs due to shifted fields resulting in incorrect 
references. 
(This may also happen when input itself contains the delimiter.)

If file2 schema is picked, this is already handled by filling the missing 
fields with null.  (PIG-3100)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to