InputFormat version problem.

Tianqiang Li Tue, 21 Sep 2010 21:07:53 -0700

Hi,
I have customized InputFormat class to read our log format in our hadoop job
and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
this inputformat to load data into Hive table by specifying InputFormat, and
a Serde when I create a table like below:


CREATE TABLE rawlog_test (
  user_id  STRING,
  tag  STRING,
  my_timestamp  STRING )
ROW FORMAT SERDE 'x.y.z.mySerDe'
STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;

Then I run:
load data inpath '/rawlog.txt' into table rawlog_test;

No error show up on screen but I found the deserialize function never got
called. An when I use select * from rawlog_test; An error was threw out:
---
FAILED: Error in semantic analysis: line 1:14 Input Format must implement
InputFormat rawlog_test
--

I search this on internet, found this might be related to Hive using old
api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
worked on Hive? Adapt my code to old api need lots of work, and even if I
get it done, maintaining two version of code sounds like a bit unnecessary,
( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
Hive at different situations. ) , are there any way that I can work around
this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.

Regards,
Peter

InputFormat version problem.

Reply via email to