[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema

Mariano Dominguez (JIRA) Mon, 12 May 2014 12:51:53 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mariano Dominguez updated HIVE-7046:
------------------------------------

    Description: 
Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to manually recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.


  was:
Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.



> Propagate addition of new columns to partition schema
> -----------------------------------------------------
>
>                 Key: HIVE-7046
>                 URL: https://issues.apache.org/jira/browse/HIVE-7046
>             Project: Hive
>          Issue Type: Improvement
>          Components: Database/Schema
>    Affects Versions: 0.12.0
>            Reporter: Mariano Dominguez
>
> Hive reads data according to the partition schema, not the table schema 
> (because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
> changes are not propagated to partitions. Thus, the schema of a partition 
> will differ from that of the table after altering the table schema; this is 
> done to preserve the ability to read existing data, particularly when using 
> binary formats such as RCFile. Binary formats do not allow changing the type 
> of a field because of the way serialization works; a field serialized as a 
> string will be displayed incorrectly if read as an integer.
> Unfortunately, as a side effect, this behavior limits the ability to add new 
> columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A 
> possible workaround is to manually recreate the partitions, but this process 
> could be unnecessarily cumbersome if the number of partitions is high. New 
> columns should be propagated to existing partitions automatically instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema

Reply via email to