Hi, Edward, Thanks for your hints, let me start with the old api first. Just curious, does hive have the plan to support 20 api?
-Peter On Tue, Sep 21, 2010 at 9:17 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang Li <peter...@gmail.com> > wrote: > > Hi, > > I have customized InputFormat class to read our log format in our hadoop > job > > and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use > > this inputformat to load data into Hive table by specifying InputFormat, > and > > a Serde when I create a table like below: > > > > CREATE TABLE rawlog_test ( > > user_id STRING, > > tag STRING, > > my_timestamp STRING ) > > ROW FORMAT SERDE 'x.y.z.mySerDe' > > STORED AS INPUTFORMAT 'x.y.z.myInputFormat' > > OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ; > > > > Then I run: > > load data inpath '/rawlog.txt' into table rawlog_test; > > > > No error show up on screen but I found the deserialize function never got > > called. An when I use select * from rawlog_test; An error was threw out: > > > > FAILED: Error in semantic analysis: line 1:14 Input Format must implement > > InputFormat rawlog_test > > > > I search this on internet, found this might be related to Hive using old > > api(0.17) of InputFormat, does anybody know are there a way to get > 0.20api > > worked on Hive? Adapt my code to old api need lots of work, and even if I > > get it done, maintaining two version of code sounds like a bit > unnecessary, > > ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and > > Hive at different situations. ) , are there any way that I can work > around > > this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks. > > > > Regards, > > Peter > > > > > > You can make a 20 InputFormat work with hive but its real PITA. The > hbase and cassandra handler both do it.Essentially you have to Extend > the new mapreduce input format and then implement methods in the old > one, use final variables and chained method calls. Example here: > > https://issues.apache.org/jira/secure/attachment/12452140/hive-1434-4-patch.txt > Essentially it if your input format is simple enough it is likely > easier to write two separate classes for both old api and new. Use the > mapred.* InputFormat with hive. >