[ https://issues.apache.org/jira/browse/PIG-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-3938: ------------------------------ Attachment: pig-3938-v01.patch Attaching my first take, {{pig-3938-v01.patch}}. I'm using ugly "instanceof" assuming I cannot create a new interface for getLoadCaster(). (Not sure if it breaks backward compatibility but it would at least require recompilation of all the loaders which is probably a non-starter.) For EvalFunc.getLoadCaster, this patch has two options. * If EvalFunc provides the loadcaster, use that. * Else, check the loadcaster used for all the params passed and see if they are identical. If yes, use that. Latter was needed since not all EvalFunc knows which loadcaster to use. For example, builtin udf like TOTUPLE simply adds Tuple wrapper but does not touch the data itself. For this case, it needs a way to forward the loadcaster used by the parameters. (This applies to IdentityColumn udf as well.) > Add LoadCaster to EvalFunc(UDF) > -------------------------------- > > Key: PIG-3938 > URL: https://issues.apache.org/jira/browse/PIG-3938 > Project: Pig > Issue Type: Bug > Components: internal-udfs > Affects Versions: 0.12.0, 0.11.1 > Reporter: Hongchang Li > Assignee: Koji Noguchi > Attachments: pig-3938-v01.patch > > > this ticket was very close to > http://stackoverflow.com/questions/8828839/how-can-correct-data-types-on-apache-pig-be-enforced. > To reproduce the issue, first, we have an UDF to cast map to bag, code almost > like(http://stackoverflow.com/questions/12476929/group-key-value-of-map-in-pig?answertab=votes#tab-top) > {code:title=test.pig} > $ cat test.pig > register polisan/maptobag.jar; > define MAPTOBAG maptobag.MAPTOBAG(); > A = load 'polisan/input1.txt' using PigStorage(' ') as (id:chararray, kv:[]); > B = foreach A generate id, MAPTOBAG(kv) as to_bag; > C = foreach B generate id, flatten(to_bag) as (key:chararray, > value:chararray); > D = group C by (id, key); > E = foreach D generate group, MIN(C.value); > dump E; > {code} > {code:title=polisan/input1.pig} > 1 [x#1,y#ab] > 1 [x#2,y#cd] > {code} > then run the pig, I got exception as following: > {noformat} > 2014-05-15 19:44:52,944 [Thread-2] WARN > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: D: Local Rearrange[tuple]{tuple}(false) - scope-42 > Operator Key: scope-42): > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while > computing min in Initial > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:1) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2106: > Error while computing min in Initial > at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:81) > at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:1) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:352) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:391) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281) > ... 8 more > Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray > cannot be cast to java.lang.String > at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:73) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)