Nitay Joffe created HIVE-3752:
---------------------------------

             Summary: Add a non-sql API in hive to access data.
                 Key: HIVE-3752
                 URL: https://issues.apache.org/jira/browse/HIVE-3752
             Project: Hive
          Issue Type: Improvement
            Reporter: Nitay Joffe


We would like to add an input/output format for accessing Hive data in Hadoop 
directly without having to use e.g. a transform. Using a transform
means having to do a whole map-reduce step with its own disk accesses and its 
imposed structure. It also means needing to have Hive be the base 
infrastructure for the entire system being developed which is not the right fit 
as we only need a small part of it (access to the data).

So we propose adding an API level InputFormat and OutputFormat to Hive that 
will make it trivially easy to select a table with partition spec and read from 
/ write to it. We chose this design to make it compatible with Hadoop so that 
existing systems that work with Hadoop's IO API will just work out of the box.

We need this system for the Giraph graph processing (http://giraph.apache.org/) 
as running graph jobs which read/write from Hive is a common use case.

[~namitjain] [~aching] [~kevinwilfong] [~apresta]


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to