Re: InputFormat version problem

Tianqiang Li Tue, 21 Sep 2010 23:17:27 -0700

Hi, Edward,
Thanks for your hints, let me start with the old api first.
Just curious, does hive have the plan to support 20 api?


-Peter

On Tue, Sep 21, 2010 at 9:17 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang  Li <peter...@gmail.com>
> wrote:
> > Hi,
> > I have customized InputFormat class to read our log format in our hadoop
> job
> > and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
> > this inputformat to load data into Hive table by specifying InputFormat,
> and
> > a Serde when I create a table like below:
> >
> > CREATE TABLE rawlog_test (
> >   user_id  STRING,
> >   tag  STRING,
> >   my_timestamp  STRING )
> > ROW FORMAT SERDE 'x.y.z.mySerDe'
> > STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
> > OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;
> >
> > Then I run:
> > load data inpath '/rawlog.txt' into table rawlog_test;
> >
> > No error show up on screen but I found the deserialize function never got
> > called. An when I use select * from rawlog_test; An error was threw out:
> >
> > FAILED: Error in semantic analysis: line 1:14 Input Format must implement
> > InputFormat rawlog_test
> >
> > I search this on internet, found this might be related to Hive using old
> > api(0.17) of InputFormat, does anybody know are there a way to get
> 0.20api
> > worked on Hive? Adapt my code to old api need lots of work, and even if I
> > get it done, maintaining two version of code sounds like a bit
> unnecessary,
> > ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
> > Hive at different situations. ) , are there any way that I can work
> around
> > this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.
> >
> > Regards,
> > Peter
> >
> >
>
> You can make a 20 InputFormat work with hive but its real PITA. The
> hbase and cassandra handler both do it.Essentially you have to Extend
> the new mapreduce input format and then implement methods in the old
> one, use final variables and chained method calls. Example here:
>
> https://issues.apache.org/jira/secure/attachment/12452140/hive-1434-4-patch.txt
> Essentially it if your input format is simple enough it is likely
> easier to write two separate classes for both old api and new. Use the
> mapred.* InputFormat with hive.
>

Re: InputFormat version problem

Reply via email to