[
https://issues.apache.org/jira/browse/PIG-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Corinne Chandel resolved PIG-1711.
----------------------------------
Resolution: Fixed
Built In Functions doc updated.
BinStorage section updated with new information.
Patch will be submitted under Pig-1772.
> Document BinStorage behaviour
> ------------------------------
>
> Key: PIG-1711
> URL: https://issues.apache.org/jira/browse/PIG-1711
> Project: Pig
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Viraj Bhat
> Assignee: Corinne Chandel
> Fix For: 0.9.0
>
>
> We need to document some features of BinStorage that can cause indeterminate
> results.
> I have a Pig script of this type:
> {code}
> raw = load 'sampledata' using BinStorage() as (col1,col2, col3);
> --filter out null columns
> A = filter raw by col1#'bcookie' is not null;
> B = foreach A generate col1#'bcookie' as reqcolumn;
> describe B;
> --B: {regcolumn: bytearray}
> X = limit B 5;
> dump X;
> B = foreach A generate (chararray)col1#'bcookie' as convertedcol;
> describe B;
> --B: {convertedcol: chararray}
> X = limit B 5;
> dump X;
> {code}
> The first dump produces:
> (36co9b55onr8s)
> (36co9b55onr8s)
> (36hilul5oo1q1)
> (36hilul5oo1q1)
> (36l4cj15ooa8a)
> The second dump produces:
> ()
> ()
> ()
> ()
> ()
> So we need to write correct documentation on why this happens. One good
> explanation seems to be:
> According to Alan:
> BinStorage should not track data lineage. In the case where Pig is using
> BinStorage (or whatever) for moving data between MR jobs then Pig can figure
> out the correct cast function to use and apply it. For cases such as the one
> here where users are storing data using BinStorage and then in a separate Pig
> Latin script reading it (and thus loosing the type information) it is the
> users responsibility to correctly cast the data before storing it in
> BinStorage.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira