Hi, I have customized InputFormat class to read our log format in our hadoop job and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use this inputformat to load data into Hive table by specifying InputFormat, and a Serde when I create a table like below:
CREATE TABLE rawlog_test ( user_id STRING, tag STRING, my_timestamp STRING ) ROW FORMAT SERDE 'x.y.z.mySerDe' STORED AS INPUTFORMAT 'x.y.z.myInputFormat' OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ; Then I run: load data inpath '/rawlog.txt' into table rawlog_test; No error show up on screen but I found the deserialize function never got called. An when I use select * from rawlog_test; An error was threw out: FAILED: Error in semantic analysis: line 1:14 Input Format must implement InputFormat rawlog_test I search this on internet, found this might be related to Hive using old api(0.17) of InputFormat, does anybody know are there a way to get 0.20api worked on Hive? Adapt my code to old api need lots of work, and even if I get it done, maintaining two version of code sounds like a bit unnecessary, ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and Hive at different situations. ) , are there any way that I can work around this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks. Regards, Peter