[ 
https://issues.apache.org/jira/browse/PIG-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-768.
----------------------------

    Resolution: Not A Problem

This is the way Pig is supposed to work.  If the loader or the user does not 
tell it what type a column is, it assumes that it is bytearray.  If later the 
script acts as if it is a certain type (by for example, applying the map 
dereference operator), then Pig assumes it is really of that type and casts it.

You are right that the loader would do better to return it as a bytearray and 
then cast it later when Pig asks it to.  However, since casts of a type to the 
same type work, what the loader does works out.

> Schema of a relation reported by DESCRIBE and allowed operations on the 
> relation are not compatible
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-768
>                 URL: https://issues.apache.org/jira/browse/PIG-768
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: George Mavromatis
>             Fix For: 0.9.0
>
>
> The DESCIBE command in the following script  prints:
> {s: bytearray, pg: bytearray, wm: bytearray}
> However, the script later treats the s field of urlMap as a map instead of a 
> bytearray, as shown in s#'Url'.
> Pig does not complain about this contradiction and at execution time, the s 
> field is treated as hash, although it was reported as byterray at parse time.
> Pig should either not report s as a byterray or exit with a parsing error.
> Note that all above operations happen before the query executes at the 
> cluster.
> register WebDataProcessing.jar; 
> register opencrawl.jar; 
> urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);
> DESCRIBE urlMap;
> -- in fact the loader in the WebDataProcessing.jar populates s and pg as 
> s:map[], pg:bag{t1:(contents:bytearray)}
> -- and defines that in determineSchema() but pig describe ignores it!
> urlMap2 = LIMIT urlMap 20;
> urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;
> DESCRIBE urlList2;
> STORE urlList2 INTO 'output2' USING BinStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to