Either the handler would need to provide its own InputFormat and Split classes 
wrapping the ones from DBInputFormat (following the example from existing 
storage handlers such as HBase, where HBaseSplit extends FileSplit and wraps an 
underlying TableSplit), or we would need to finally clean up HiveInputFormat to 
stop assuming everything is file-based.

JVS

On Aug 4, 2010, at 3:34 AM, amit jaiswal wrote:

> Hi,
> 
> Any pointers as what needs to be done for implementing storage handler for 
> this 
> functionality? What all things need to be taken care of. Will it be a small 
> change, or something big?
> 
> -regards
> Amit
> 
> 
> ----- Original Message ----
> From: Edward Capriolo <[email protected]>
> To: [email protected]
> Sent: Mon, 2 August, 2010 7:58:55 PM
> Subject: Re: How to mount/proxy a db table in hive
> 
> On Mon, Aug 2, 2010 at 2:33 AM, Sonal Goyal <[email protected]> wrote:
>> Hi Amit,
>> 
>> Hive needs data to be stored in its own namespace. Can you please explain
>> why you want to call the database through Hive ?
>> 
>> Thanks and Regards,
>> Sonal
>> www.meghsoft.com
>> http://in.linkedin.com/in/sonalgoyal
>> 
>> 
>> On Mon, Aug 2, 2010 at 11:56 AM, amit jaiswal <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a database and am looking for a way to 'mount' the db table in hive
>>> in
>>> such a way that the select query in hive gets translated to sql query for
>>> database. I saw DBInputFormat and sqoop, but nothing that can create a
>>> proxy
>>> table in hive which internally makes db calls.
>>> 
>>> I also tried to use custom variant of DBInputFormat as the input format
>>> for the
>>> database table.
>>> 
>>> create table employee (id int, name string) stored as INPUTFORMAT
>>> 'mycustominputformat' OUTPUTFORMAT
>>> 'org.apache.hadoop.mapred.SequenceFileOutputFormat';
>>> 
>>> select id from employee;
>>> This fails while running hadoop job because HiveInputFormat only supports
>>> FileSplits.
>>> 
>>> HiveInputFormat:
>>>   public long getStart() {
>>>     if (inputSplit instanceof FileSplit) {
>>>       return ((FileSplit)inputSplit).getStart();
>>>     }
>>>     return 0;
>>>   }
>>> 
>>> Any suggestions as if there are any InputFormat implementation that can be
>>> used?
>>> 
>>> -amit
>> 
>> 
> Maybe the new 'strorage handlers' would help. Storage handlers tie
> together, input formats, serde's and create/drop table functions. a
> JDBC backend storage handler would be a pretty neat thing.
> 

Reply via email to