[ 
https://issues.apache.org/jira/browse/PIG-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031029#comment-14031029
 ] 

Rohini Palaniswamy commented on PIG-3938:
-----------------------------------------

[~polisan],
  I don't think simple script in description reproduces the exact issue faced 
by the user. 

> A = load 'input.txt' using PigStorage(' ') as (id:chararray, kv:[]);
 Schema of the map in this case would be key:chararray value:bytearrary.
> E = foreach B generate id, flatten(to_bag) as (key:chararray, 
> value:chararray);
  As clause does not work in FOREACH GENERATE (PIG:2315). You have to
explicitly type cast. 

In the users case, it still did not work after doing equivalent of
kv:[chararray] in LOAD (Comment 14) and doing explicit type cast in foreach
after flattening the bag (Comment 1). It should have worked just doing either
one of the above. So there are two issues to be investigated and fixed.

> Type cast doesn't work after flatten result of UDF
> --------------------------------------------------
>
>                 Key: PIG-3938
>                 URL: https://issues.apache.org/jira/browse/PIG-3938
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.12.0, 0.11.1
>            Reporter: Hongchang Li
>
> this ticket was very close to 
> http://stackoverflow.com/questions/8828839/how-can-correct-data-types-on-apache-pig-be-enforced.
> To reproduce the issue, first, we have an UDF to cast map to bag, code almost 
> like(http://stackoverflow.com/questions/12476929/group-key-value-of-map-in-pig?answertab=votes#tab-top)
> {code:title=test.pig}
> $ cat test.pig
> register polisan/maptobag.jar;
> define MAPTOBAG maptobag.MAPTOBAG();
> A = load 'polisan/input1.txt' using PigStorage(' ') as (id:chararray, kv:[]);
> B = foreach A generate id, MAPTOBAG(kv) as to_bag;
> C = foreach B generate id, flatten(to_bag) as (key:chararray, 
> value:chararray);
> D = group C by (id, key);
> E = foreach D generate group, MIN(C.value);
> dump E;
> {code}
> {code:title=polisan/input1.pig}
> 1 [x#1,y#ab]
> 1 [x#2,y#cd]
> {code}
> then run the pig, I got exception as following:
> {noformat}
> 2014-05-15 19:44:52,944 [Thread-2] WARN  
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: D: Local Rearrange[tuple]{tuple}(false) - scope-42 
> Operator Key: scope-42): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
> computing min in Initial
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:1)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2106: 
> Error while computing min in Initial
>       at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:81)
>       at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:1)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:352)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:391)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
>       ... 8 more
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray 
> cannot be cast to java.lang.String
>       at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:73)
>       ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to