[ 
https://issues.apache.org/jira/browse/PIG-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932238#action_12932238
 ] 

Mike Dillon commented on PIG-1718:
----------------------------------

Thanks for the update Santhosh. Is the semantics cleanup targeted for a 
particular release or milestone? If so, it would be great if this JIRA issue 
could either be included in that milestone, marked as depending on an upstream 
issue, or closed as a duplicate.

> Cannot directly cast output of UDF
> ----------------------------------
>
>                 Key: PIG-1718
>                 URL: https://issues.apache.org/jira/browse/PIG-1718
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>         Environment: Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3
>            Reporter: Mike Dillon
>            Priority: Minor
>
> I'm in the process of writing a suite of UDFs to deal with nested JSON data 
> inside of Pig. In one case, I created a UDF of type EvalFunc<String> and 
> wanted to use it like so:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractString(json, 'count') as count:int
> {code}
> When I do this, I get the following error:
> {quote}
> ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray. 
> Other Field Schema: count: int
> {quote}
> I can work around it by adding another projection with just a cast (as 
> below), but I'd prefer if the form I just first just worked.
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> MID = foreach RAW generate id, ExtractString(json, 'count') as count
> IN = foreach MID generate id, (int)count
> {code}
> I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I 
> can avoid it. In our case, it gets even more cumbersome because we want to 
> have something like ExtractStringTuple extends EvalFunc<Tuple> that returns a 
> tuple of strings without parsing the JSON over and over again:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count', 
> 'mean') as (name, count:int, mean:double);
> {code}
> As indicated, I have tested this with Pig 0.7.0. My apologies if this is 
> already fixed in 0.8 since I was not able to test with a newer version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to