Hello,
I have a question regarding treatment of dates with PIG.
My input files contain a timestamp field in 'yyyymmdd hh:mm:ss' format (e.g.
20090201 14:42:00 ) within a comma delimited file. I want to aggregate to
day-level relying on extracting the date portion (e.g. yyyymmdd, so the
20090201 ) of the timestamp only. I have been experimenting with the tokenize
function but I am unclear how to accomplish an aggregation by date.
What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?
Here are the details:
Input Data:
4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1
Desired Output:
20090201, 122
20090202, 130
--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
describe A;
B = foreach A generate group, tokenize(A.v2) as (date,time); --fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3);
dump D;
grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
alias: group in A: (v1, v2, v3, v4 )
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more
2009-02-18 15:11:44,279 [main] ERROR org.apache.pig.tools.grunt.GruntParser -
java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>
Thanks in advance,
Avram