[ 
https://issues.apache.org/jira/browse/PIG-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944799#comment-14944799
 ] 

Niels Basjes commented on PIG-4689:
-----------------------------------

Quick first analysis of the problem:
In the [checkSchema method the schema is stored with the class and the 
udfContextSignature as the 'key' for this class 
([Link|https://github.com/apache/pig/blob/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java#L245]).
Because in both instances we have the same class AND the same the argumentlist 
they apparently get the same udfContextSignature : Thus one of them overwrites 
the schema for the other.

At this point my best guess is that the actual values that Pig calls the 
'setUDFContextSignature' with is not unique enough.


> CSV Writes incorrect header if two CSV files are created in one script
> ----------------------------------------------------------------------
>
>                 Key: PIG-4689
>                 URL: https://issues.apache.org/jira/browse/PIG-4689
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Niels Basjes
>
> From a single Pig script I write two completely different and unrelated CSV 
> files; both with the flag 'WRITE_OUTPUT_HEADER'.
> The bug is that both files get the SAME header at the top of the output file 
> even though the data is different.
> *Reproduction:*
> {code:title=foo.txt}
> 1
> {code}
> {code:title=bar.txt (Tab separated)}
> 1     a
> {code}
> {code:title=WriteTwoCSV.pig}
> FOO =
>     LOAD 'foo.txt'
>     USING PigStorage('\t')
>     AS (a:chararray);
> BAR =
>     LOAD 'bar.txt'
>     USING PigStorage('\t')
>     AS (b:chararray, c:chararray);
> STORE FOO into 'Foo'
> USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 
> 'UNIX', 'WRITE_OUTPUT_HEADER');
> STORE BAR into 'Bar'
> USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 
> 'UNIX', 'WRITE_OUTPUT_HEADER');
> {code}
> *Command:*
> {quote}pig -x local WriteTwoCSV.pig{quote}
> *Result:*
> {quote}cat Bar/part-*{quote}
> {code}
> b     c
> 1     a
> {code}
> {quote}cat Foo/part-*{quote}
> {code}
> b     c
> 1
> {code}
> *The error is that the {{Foo}} output has a the two column header from the 
> {{Bar}} output.*
> *One of the effects is that parsing the {{Foo}} data will probably fail due 
> to the varying number of columns*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to