Re: Immutable data in Hive

Alan Gates Wed, 30 Dec 2015 10:01:00 -0800

Traditionally data in Hive was write once (insert) read many. You couldappend to tables and partitions, add new partitions, etc. You couldremove data by dropping tables or partitions. But there was no updatesof data or deletes of particular rows. This was what was meant byimmutable. Hive was originally done this way because it was based onMapReduce and HDFS and these were the natural semantics given thoseunderlying systems.

For many use cases (e.g. ETL) this is sufficient, and the vast majorityof people still run Hive this way.

We added transactions and updates and deletes to Hive because some usecases require these features. Hive is being used more and more as adata warehouse, and while updates and deletes are less common there theyare still required (slow changing dimensions, fixing wrong data,deleting records for compliance, etc.) Also streaming data intowarehouses from transactional systems is a common use case.


Alan.

Ashok Kumar <mailto:ashok34...@yahoo.com>
December 29, 2015 at 14:59
Hi,

Can someone please clarify what  "immutable data" in Hive means?
I have been told that data in Hive is/should be immutable but in thatcase why we need transactional tables in Hive that allow updates to data.
thanks and greetings

Re: Immutable data in Hive

Reply via email to