Hello,

I have a question regarding treatment of dates with PIG.  

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss' format (e.g. 
20090201 14:42:00 ) within a comma delimited file.  I want to aggregate to 
day-level relying on extracting the date portion (e.g. yyyymmdd, so the 
20090201 ) of the timestamp only.  I have been experimenting with the tokenize 
function but I am unclear how to accomplish an aggregation by date.  

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
describe A;
B = foreach A generate group, tokenize(A.v2) as (date,time); --fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3);
dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
        at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
        at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
        at 
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
        at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
alias: group in A: (v1, v2, v3, v4 )
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
        at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
        at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
        ... 5 more

2009-02-18 15:11:44,279 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

Reply via email to