xplenty created PIG-4758:
----------------------------
Summary: In PigStorage when Using -tagPath or -tagFile Option
columns order out of sync
Key: PIG-4758
URL: https://issues.apache.org/jira/browse/PIG-4758
Project: Pig
Issue Type: Bug
Components: internal-udfs, piggybank
Affects Versions: 0.15.0, 0.14.0
Reporter: xplenty
when using the following script:
{code:borderStyle=solid}
a= LOAD 'data.csv' USING PigStorage('\t','-tagPath') AS (filepath:chararray,
f1:chararray, f2:chararray);
b = FOREACH a GENERATE filepath, f2;
dump b;
{code}
The output will contain the data from _filepath_ and from _f1_ fields instead
of _f2_ field.
This is caused because of a bug within PigStorage (it also happens in
CSVExcelStorage) where it doesn't take the tagPath/tagFile into account when
calculating _requiredColumns_ index:
{code:title=PigStorage.java|borderStyle=solid}
if (mRequiredColumns==null || (mRequiredColumns.length>fieldID &&
mRequiredColumns[fieldID]))
addTupleValue(mProtoTuple, buf, start, i);
{code}
but _fieldID_ doesn't take the tagFile/tagPath column into account.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)