[
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772021#comment-13772021
]
Alan Gates commented on HIVE-5317:
----------------------------------
Brock, we did look at that. We didn't go that route for a couple of reasons:
# Adding transactions to HBase is a fair amount of work. See Google's
Percolator paper on one approach to that.
# HBase can't offer the same scan speed as HDFS. Since we're choosing to focus
this on updates done in the OLAP style work loads HBase isn't going to be a
great storage mechanism for the data. I agree it might make sense to have
transactions on HBase for a more OLTP style workload.
> Implement insert, update, and delete in Hive with full ACID support
> -------------------------------------------------------------------
>
> Key: HIVE-5317
> URL: https://issues.apache.org/jira/browse/HIVE-5317
> Project: Hive
> Issue Type: New Feature
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive
> tables with full ACID support. The use cases are varied, but the form of the
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various
> dimension tables (eg. customer, inventory, stores) needs to be processed. The
> dimension tables have primary keys and are typically bucketed and sorted on
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact
> tables need to be updated (up to 1m rows) to reflect the new data. The
> transactions are a combination of inserts, updates, and deletes. The table is
> partitioned and bucketed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira