Hi. Started to lab with Hive today since it seems to suit us quite well and since we are processing our weblogstats with Hadoop as of today and ends up doing SQL in Hadoop form it seems fair to try out a system which does that in one step :)
I've created and loaded data into Hive with the following statements; hive> drop table DailyUniqueSiteVisitorSample; OK Time taken: 4.064 seconds hive> CREATE TABLE DailyUniqueSiteVisitorSample (sampleDate date,uid bigint,site int,concreteStatistics int,network smallint,category smallint,country smallint,countryCode String,sessions smallint,pageImpressions smallint) COMMENT 'This is our weblog stats table' PARTITIONED BY(dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; OK Time taken: 0.248 seconds hive> LOAD DATA LOCAL INPATH '/tmp/data-DenormalizedSiteVisitor.VisitsPi.2009-03-02.csv' INTO TABLE DailyUniqueSiteVisitorSample PARTITION(dt='2009-03-02'); Copying data from file:/tmp/data-2009-03-02.csv Loading data to table dailyuniquesitevisitorsample partition {dt=2009-03-02} OK Time taken: 2.258 seconds A little confused about the text-file part but since the csv I need to insert is a text-file so... (the tutorial only uses SequenceFile(s)), seems to work though. Anyway this goes well but when I issue a simple query like the below it throws an exception: hive> select d.* from dailyuniquesitevisitorsample d where d.site=1; Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator java.lang.AbstractMethodError: org.apache.hadoop.hive.ql.io.HiveInputFormat.validateInput(Lorg/apache/hadoop/mapred/JobConf;)V at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:735) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:391) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:306) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) I run Hadoop-018.2 Not sure that I am doing this correctly. Please guide me if I am stupid. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/