-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/#review1309
-----------------------------------------------------------


Also, can you add migration scripts for other DB's?


trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
<https://reviews.apache.org/r/1183/#comment2982>

    Typo



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/1183/#comment2979>

    The check and the delete should in the same transaction, as it's possible 
for a reference to a CD to be created after the check but before the delete.



trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
<https://reviews.apache.org/r/1183/#comment2981>

    How does this drop the storage descriptor?



trunk/metastore/src/model/package.jdo
<https://reviews.apache.org/r/1183/#comment2968>

    Fix indent


- Paul


On 2011-08-05 20:49:19, Sohan Jain wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1183/
> -----------------------------------------------------------
> 
> (Updated 2011-08-05 20:49:19)
> 
> 
> Review request for hive, Ning Zhang and Paul Yang.
> 
> 
> Summary
> -------
> 
> This patch tries to make minimal changes to the API while keeping migration 
> short and somewhat easy to revert.
> 
> The new schema can be described as follows:
> - CDS is a table corresponding to Column Descriptor objects.  Currently, it 
> only stores a CD_ID.
> - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns.  A 
> Column Descriptor holds a list of columns.  COLUMNS_V2 has a foreign key to 
> the CD_ID to which it belongs.
> - SDS was modified to reference a Column Descriptor. So SDS now has a foreign 
> key to a CD_ID which describes its columns.
> 
> During migration, we create Column Descriptors for tables in a 
> straightforward manner: their columns are now just wrapped inside a column 
> descriptor.  The SDS of partitions use their parent table's column 
> descriptor, since currently a partition and its table share the same list of 
> columns.
> 
> When altering or adding a partition, give it it's parent table's column 
> descriptor IF the columns they describe are the same.  Otherwise, create a 
> new column descriptor for its columns.
> 
> When adding or altering a table, create a new column descriptor every time.
> 
> Whenever you drop a storage descriptor (e.g, when dropping tables or 
> partitions), check to see if the related column descriptor has any other 
> references in the table.  That is, check to see if any other storage 
> descriptors point to that column descriptor.  If none do, then delete that 
> column descriptor.  This check is in place so we don't have unreferenced 
> column descriptors and columns hanging around after schema evolution for 
> tables.
> 
> 
> This addresses bug HIVE-2246.
>     https://issues.apache.org/jira/browse/HIVE-2246
> 
> 
> Diffs
> -----
> 
>   trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION 
>   
> trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> 1153927 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 1153927 
>   
> trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
>  PRE-CREATION 
>   
> trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
>  1153927 
>   trunk/metastore/src/model/package.jdo 1153927 
>   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927 
>   
> trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java 
> 1153927 
> 
> Diff: https://reviews.apache.org/r/1183/diff
> 
> 
> Testing
> -------
> 
> Passes facebook's regression testing and all existing test cases.  In one 
> instance, before migration, the overhead involved with storage descriptors 
> and columns was ~11 GB.  After migration, the overhead was ~1.5 GB.
> 
> 
> Thanks,
> 
> Sohan
> 
>

Reply via email to