[ https://issues.apache.org/jira/browse/HIVE-25779?focusedWorklogId=792785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792785 ]
ASF GitHub Bot logged work on HIVE-25779: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/Jul/22 16:01 Start Date: 19/Jul/22 16:01 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3221: URL: https://github.com/apache/hive/pull/3221#discussion_r924692232 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java: ########## @@ -5425,6 +5492,55 @@ private void removeUnusedColumnDescriptor(MColumnDescriptor oldCD) { } } + /** + * Checks if a serde info has any remaining references by storage descriptors + * in the db. If it does not, then delete the SerDe info. If it does, then do nothing. + * @param oldSerDeInfo the serde info to delete if it is no longer referenced anywhere + */ + private void removeUnusedSerDeInfo(MSerDeInfo oldSerDeInfo) { + if (oldSerDeInfo == null) { + return; + } + LOG.debug("executing removeUnusedSerDeInfo"); Review Comment: Wouldn't it be ideal to log the value of the oldSerDeInfo variable in the debug message? Issue Time Tracking ------------------- Worklog Id: (was: 792785) Time Spent: 3h 20m (was: 3h 10m) > Deduplicate SerDe Info > ---------------------- > > Key: HIVE-25779 > URL: https://issues.apache.org/jira/browse/HIVE-25779 > Project: Hive > Issue Type: New Feature > Components: Standalone Metastore > Reporter: Yu-Wen Lai > Assignee: Yu-Wen Lai > Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > The proposal is that we can reuse serde info as how we reuse column > descriptors. (HIVE-2246) > Currently, we store the metadata for partitions as PARTITIONS (N partitions) > -> SDS (N locations) -> SERDES (N entries). However, all the SERDES for the > partitions in a table are the same if we don't explicitly specify it. That > is, each storage descriptor has a associated and exclusive serde info, but > the partitions' serde infos are mostly just the same as the table's. By > reusing the serde info, we can save some database storage and enhance the > query performance from HMS to the backend database. > For backward compatibility, we also need to introduce a config for this > feature because there will be issues if HMS old instance and HMS new instance > with this feature are running together. With this feature, we will need to > check if others reference the serdes before deleting it, but the old instance > will just delete it. > The other thing we need to take care of is custom serdes. If a partition's > serde is modified, we need to create a new record in SERDES so that we don't > interfere other partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)