[ 
https://issues.apache.org/jira/browse/PIG-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1711.
----------------------------------

    Resolution: Fixed

Built In Functions doc updated. 

BinStorage section updated with new information.

Patch will be submitted under Pig-1772.

> Document BinStorage behaviour 
> ------------------------------
>
>                 Key: PIG-1711
>                 URL: https://issues.apache.org/jira/browse/PIG-1711
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Viraj Bhat
>            Assignee: Corinne Chandel
>             Fix For: 0.9.0
>
>
> We need to document some features of BinStorage that can cause indeterminate 
> results.
> I have a Pig script of this type:
> {code}
> raw = load 'sampledata' using BinStorage() as (col1,col2, col3);
> --filter out null columns
> A = filter raw by col1#'bcookie' is not null;
> B = foreach A generate col1#'bcookie'  as reqcolumn;
> describe B;
> --B: {regcolumn: bytearray}
> X = limit B 5;
> dump X;
> B = foreach A generate (chararray)col1#'bcookie'  as convertedcol;
> describe B;
> --B: {convertedcol: chararray}
> X = limit B 5;
> dump X;
> {code}
> The first dump produces:
> (36co9b55onr8s)
> (36co9b55onr8s)
> (36hilul5oo1q1)
> (36hilul5oo1q1)
> (36l4cj15ooa8a)
> The second dump produces:
> ()
> ()
> ()
> ()
> ()
> So we need to write correct documentation on why this happens. One good 
> explanation seems to be:
> According to Alan:
> BinStorage should not track data lineage. In the case where Pig is using 
> BinStorage (or whatever) for moving data between MR jobs then Pig can figure 
> out the correct cast function to use and apply it. For cases such as the one 
> here where users are storing data using BinStorage and then in a separate Pig 
> Latin script reading it (and thus loosing the type information) it is the 
> users responsibility to correctly cast the data before storing it in 
> BinStorage.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to