[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547981#comment-13547981 ]
Hudson commented on HIVE-2246: ------------------------------ Integrated in Hive-trunk-hadoop2 #54 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/54/]) HIVE-3424. Error by upgrading a Hive 0.7.0 database to 0.8.0 (008-HIVE-2246.mysql.sql) (Alexander Alten-Lorenz via cws) (Revision 1380483) Result = ABORTED cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1380483 Files : * /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql > Dedupe tables' column schemas from partitions in the metastore db > ----------------------------------------------------------------- > > Key: HIVE-2246 > URL: https://issues.apache.org/jira/browse/HIVE-2246 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Sohan Jain > Assignee: Sohan Jain > Fix For: 0.8.0 > > Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, > HIVE-2246.8.patch > > > Note: this patch proposes a schema change, and is therefore incompatible with > the current metastore. > We can re-organize the JDO models to reduce space usage to keep the metastore > scalable for the future. Currently, partitions are the fastest growing > objects in the metastore, and the metastore keeps a separate copy of the > columns list for each partition. We can normalize the metastore db by > decoupling Columns from Storage Descriptors and not storing duplicate lists > of the columns for each partition. > An idea is to create an additional level of indirection with a "Column > Descriptor" that has a list of columns. A table has a reference to its > latest Column Descriptor (note: a table may have more than one Column > Descriptor in the case of schema evolution). Partitions and Indexes can > reference the same Column Descriptors as their parent table. > Currently, the COLUMNS table in the metastore has roughly (number of > partitions + number of tables) * (average number of columns pertable) rows. > We can reduce this to (number of tables) * (average number of columns per > table) rows, while incurring a small cost proportional to the number of > tables to store the Column Descriptors. > Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira