xplenty created PIG-4758:
----------------------------

             Summary: In PigStorage when Using -tagPath or -tagFile Option 
columns order out of sync
                 Key: PIG-4758
                 URL: https://issues.apache.org/jira/browse/PIG-4758
             Project: Pig
          Issue Type: Bug
          Components: internal-udfs, piggybank
    Affects Versions: 0.15.0, 0.14.0
            Reporter: xplenty


when using the following script: 

{code:borderStyle=solid}
a= LOAD 'data.csv' USING PigStorage('\t','-tagPath') AS (filepath:chararray, 
f1:chararray, f2:chararray); 
b = FOREACH a GENERATE filepath, f2; 
dump b; 
{code}

The output will contain the data from _filepath_ and from _f1_ fields instead 
of _f2_ field. 
This is caused because of a bug within PigStorage (it also happens in 
CSVExcelStorage) where it doesn't take the tagPath/tagFile into account when 
calculating _requiredColumns_ index:

{code:title=PigStorage.java|borderStyle=solid}
if (mRequiredColumns==null || (mRequiredColumns.length>fieldID && 
mRequiredColumns[fieldID])) 
        addTupleValue(mProtoTuple, buf, start, i); 
{code}

but _fieldID_ doesn't take the tagFile/tagPath column into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to