[ 
https://issues.apache.org/jira/browse/HIVE-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6405:
-----------------------------------

    Description: 
HCatalog currently treats all tables as "immutable" - i.e. all tables and 
partitions can be written to only once, and not appended. The nuances of what 
this means is as follows:

 * A non-partitioned table can be written to, and data in it is never updated 
from then on unless you drop and recreate.

 * A partitioned table may support "appending" of a sort in a manner by adding 
new partitions to the table, but once written, the partitions themselves cannot 
have any new data added to them.

Hive, on the other hand, does allow us to "INSERT INTO" into a table, thus 
allowing us append semantics. There is benefit to both of these models, and so, 
our goal is as follows:

a) Introduce a notion of an immutable table, wherein all tables are not 
immutable by default, and have this be a table property. If this property is 
set for a table, and we attempt to write to a table that already has data (or a 
partition), disallow "INSERT INTO" into it from hive. This property being set 
will allow hive to mimic HCatalog's current immutable-table property. (I'm 
going to create a separate sub-task to cover this bit, and focus on the 
HCatalog-side on this jira)

b) As long as that flag is not set, HCatalog should be changed to allow appends 
into it as well, and not simply error out if data already exists in a table.

  was:
HCatalog currently treats all tables as "immutable" - i.e. all tables and 
partitions can be written to only once, and not appended. The nuances of what 
this means is as follows:

 * A non-partitioned table can be written to, and data in it is never updated 
from then on unless you drop and recreate.

 * A partitioned table may support "appending" of a sort in a manner by adding 
new partitions to the table, but once written, the partitions themselves cannot 
have any new data added to them.

Hive, on the other hand, does allow us to "INSERT INTO" a table, thus allowing 
us append semantics. There is benefit to both of these models, and so, our goal 
is as follows:

a) Introduce a notion of an immutable table, wherein all tables are not 
immutable by default, and have this be a table property. If this property is 
set for a table, and we attempt to write to a table that already has data (or a 
partition), disallow "INSERT INTO" into it from hive. This property being set 
will allow hive to mimic HCatalog's current immutable-table property. (I'm 
going to create a separate sub-task to cover this bit, and focus on the 
HCatalog-side on this jira)

b) As long as that flag is not set, HCatalog should be changed to allow appends 
into it as well, and not simply error out if data already exists in a table.


> Support append feature for HCatalog
> -----------------------------------
>
>                 Key: HIVE-6405
>                 URL: https://issues.apache.org/jira/browse/HIVE-6405
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Metastore, Query Processor, Thrift API
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> HCatalog currently treats all tables as "immutable" - i.e. all tables and 
> partitions can be written to only once, and not appended. The nuances of what 
> this means is as follows:
>  * A non-partitioned table can be written to, and data in it is never updated 
> from then on unless you drop and recreate.
>  * A partitioned table may support "appending" of a sort in a manner by 
> adding new partitions to the table, but once written, the partitions 
> themselves cannot have any new data added to them.
> Hive, on the other hand, does allow us to "INSERT INTO" into a table, thus 
> allowing us append semantics. There is benefit to both of these models, and 
> so, our goal is as follows:
> a) Introduce a notion of an immutable table, wherein all tables are not 
> immutable by default, and have this be a table property. If this property is 
> set for a table, and we attempt to write to a table that already has data (or 
> a partition), disallow "INSERT INTO" into it from hive. This property being 
> set will allow hive to mimic HCatalog's current immutable-table property. 
> (I'm going to create a separate sub-task to cover this bit, and focus on the 
> HCatalog-side on this jira)
> b) As long as that flag is not set, HCatalog should be changed to allow 
> appends into it as well, and not simply error out if data already exists in a 
> table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to