[
https://issues.apache.org/jira/browse/PIG-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932287#action_12932287
]
Mike Dillon commented on PIG-1718:
----------------------------------
The table on that wiki page says that the changes required for this JIRA are
backwards incompatible, but that only applies to the first of the two cleanup
possibilities (i.e. removing the ability to have types in the AS clause). If
the second option is chosen of making the schema declared by an AS clause act
like an implied cast, then there is no backwards compatibility problem since
any script that is currently putting types in the AS clause is only working if
those types exactly match the source types. If anyone had tried to rely on
conversion (as I did), then they'd get an error.
Incidentally, I *hugely* prefer the backwards compatible option of allowing an
implied coercion in this case.
> Cannot directly cast output of UDF
> ----------------------------------
>
> Key: PIG-1718
> URL: https://issues.apache.org/jira/browse/PIG-1718
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.7.0
> Environment: Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3
> Reporter: Mike Dillon
> Priority: Minor
>
> I'm in the process of writing a suite of UDFs to deal with nested JSON data
> inside of Pig. In one case, I created a UDF of type EvalFunc<String> and
> wanted to use it like so:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractString(json, 'count') as count:int
> {code}
> When I do this, I get the following error:
> {quote}
> ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray.
> Other Field Schema: count: int
> {quote}
> I can work around it by adding another projection with just a cast (as
> below), but I'd prefer if the form I just first just worked.
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> MID = foreach RAW generate id, ExtractString(json, 'count') as count
> IN = foreach MID generate id, (int)count
> {code}
> I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I
> can avoid it. In our case, it gets even more cumbersome because we want to
> have something like ExtractStringTuple extends EvalFunc<Tuple> that returns a
> tuple of strings without parsing the JSON over and over again:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count',
> 'mean') as (name, count:int, mean:double);
> {code}
> As indicated, I have tested this with Pig 0.7.0. My apologies if this is
> already fixed in 0.8 since I was not able to test with a newer version.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.