[ https://issues.apache.org/jira/browse/PIG-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253020#comment-13253020 ]
Dmitriy V. Ryaboy commented on PIG-1965: ---------------------------------------- That's because saying that a string is an int is not sufficient to turn it into an int. This should work: {code} c = foreach a generate FLATTEN (REGEX_EXTRACT_ALL (line, '.* f1:(\\S+) .* f2:(\\d+).*')) as (f1:chararray, f2:chararray); d = foreach c generate f1, (int) f2; e = foreach (group d by f1) generate group, AVG(d.f2); dump e {code} > Aggregation not working in conjunction with REGEX_EXTRACT_ALL > ------------------------------------------------------------- > > Key: PIG-1965 > URL: https://issues.apache.org/jira/browse/PIG-1965 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Reporter: Sven Krasser > > For the script below, the AVG function fails with an exception: > a = load 'bug.txt' using TextLoader() as (line:chararray); > c = foreach a generate FLATTEN (REGEX_EXTRACT_ALL (line, '.* f1:(\\S+) .* > f2:(\\d+).*')) as (f1:chararray, f2:int); > describe c; -- this is c: {f1: chararray,f2: int} > d = group c by f1; > e = foreach d generate group, AVG(c.f2); > dump e > Data for bug.txt: > a f1:abc d f2:12 > f f1:def d f2:23 > w f1:abc w f2:45 > r f1:abc w f2:24 > e f1:def q f2:34 > If I store the data after parsing it, the schema stays the same, but now the > AVG function works. Workaround script: > a = load 'bug.txt' using TextLoader() as (line:chararray); > b = foreach a generate FLATTEN (REGEX_EXTRACT_ALL (line, '.* f1:(\\S+) .* > f2:(\\d+).*')) as (f1:chararray, f2:int); > describe b; -- this is b: {f1: chararray,f2: int} > store b into 'temp'; > c = load 'temp' as (f1:chararray, f2:int); > describe c; -- this is c: {f1: chararray,f2: int} > d = group c by f1; > e = foreach d generate group, AVG(c.f2); > dump e > Exception in the error case: > org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while > computing average in Initial > at org.apache.pig.builtin.IntAvg$Initial.exec(IntAvg.java:100) > at org.apache.pig.builtin.IntAvg$Initial.exec(IntAvg.java:75) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:273) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:343) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to > java.lang.Integer > at org.apache.pig.builtin.IntAvg$Initial.exec(IntAvg.java:87) > ... 14 more > [...] > 2011-04-05 18:18:33,865 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed! > 2011-04-05 18:18:33,866 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics > - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - > already initialized > 2011-04-05 18:18:33,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1066: Unable to open iterator for alias e -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira