----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/#review1309 -----------------------------------------------------------
Also, can you add migration scripts for other DB's? trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql <https://reviews.apache.org/r/1183/#comment2982> Typo trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java <https://reviews.apache.org/r/1183/#comment2979> The check and the delete should in the same transaction, as it's possible for a reference to a CD to be created after the check but before the delete. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java <https://reviews.apache.org/r/1183/#comment2981> How does this drop the storage descriptor? trunk/metastore/src/model/package.jdo <https://reviews.apache.org/r/1183/#comment2968> Fix indent - Paul On 2011-08-05 20:49:19, Sohan Jain wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/1183/ > ----------------------------------------------------------- > > (Updated 2011-08-05 20:49:19) > > > Review request for hive, Ning Zhang and Paul Yang. > > > Summary > ------- > > This patch tries to make minimal changes to the API while keeping migration > short and somewhat easy to revert. > > The new schema can be described as follows: > - CDS is a table corresponding to Column Descriptor objects. Currently, it > only stores a CD_ID. > - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A > Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to > the CD_ID to which it belongs. > - SDS was modified to reference a Column Descriptor. So SDS now has a foreign > key to a CD_ID which describes its columns. > > During migration, we create Column Descriptors for tables in a > straightforward manner: their columns are now just wrapped inside a column > descriptor. The SDS of partitions use their parent table's column > descriptor, since currently a partition and its table share the same list of > columns. > > When altering or adding a partition, give it it's parent table's column > descriptor IF the columns they describe are the same. Otherwise, create a > new column descriptor for its columns. > > When adding or altering a table, create a new column descriptor every time. > > Whenever you drop a storage descriptor (e.g, when dropping tables or > partitions), check to see if the related column descriptor has any other > references in the table. That is, check to see if any other storage > descriptors point to that column descriptor. If none do, then delete that > column descriptor. This check is in place so we don't have unreferenced > column descriptors and columns hanging around after schema evolution for > tables. > > > This addresses bug HIVE-2246. > https://issues.apache.org/jira/browse/HIVE-2246 > > > Diffs > ----- > > trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION > > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 1153927 > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 1153927 > > trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java > PRE-CREATION > > trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java > 1153927 > trunk/metastore/src/model/package.jdo 1153927 > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927 > > trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java > 1153927 > > Diff: https://reviews.apache.org/r/1183/diff > > > Testing > ------- > > Passes facebook's regression testing and all existing test cases. In one > instance, before migration, the overhead involved with storage descriptors > and columns was ~11 GB. After migration, the overhead was ~1.5 GB. > > > Thanks, > > Sohan > >