[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

Namit Jain (JIRA) Thu, 29 Nov 2012 20:08:02 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507078#comment-13507078
 ]


Namit Jain commented on HIVE-3752:
----------------------------------

[~appodictic], yes we did. 

While HCatalog is a neat project, there are a several reasons why an Hive 
input/format packaged with Hive is better for Apache Giraph
*  HCatalog (trunk) unfortunately is not compatible with Hadoop-0.20
*  Hcatalog is much more complex than simply being an API to use Hive.  We only 
require a small part of Hcatalog's functionality, so having only a portion of 
this functionality will be easier to fix/update/maintain going forward
* Having an input/output format that is part of Hive will guarantee its 
compatibility with Hive going forward

As an aside, Hcatalog could also use this new input/output format to interface 
with Hive, potentially enabling a portion of its code to be simpler. 

In nutshell, HCatalog is a overkill for our simple usecase, and we want to 
avoid dependency on as many systems as possible.
For a simple usecase like ours, enhancing hive seems like a much simpler option 
and easier to maintain in the longer term.

ccing [~alangates], [~cwsteinbach]
                
> Add a non-sql API in hive to access data.
> -----------------------------------------
>
>                 Key: HIVE-3752
>                 URL: https://issues.apache.org/jira/browse/HIVE-3752
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>
> We would like to add an input/output format for accessing Hive data in Hadoop 
> directly without having to use e.g. a transform. Using a transform
> means having to do a whole map-reduce step with its own disk accesses and its 
> imposed structure. It also means needing to have Hive be the base 
> infrastructure for the entire system being developed which is not the right 
> fit as we only need a small part of it (access to the data).
> So we propose adding an API level InputFormat and OutputFormat to Hive that 
> will make it trivially easy to select a table with partition spec and read 
> from / write to it. We chose this design to make it compatible with Hadoop so 
> that existing systems that work with Hadoop's IO API will just work out of 
> the box.
> We need this system for the Giraph graph processing system 
> (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
> is a common use case.
> [~namitjain] [~aching] [~kevinwilfong] [~apresta]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

Reply via email to