[ 
https://issues.apache.org/jira/browse/HIVE-25779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai updated HIVE-25779:
------------------------------
    Description: 
The proposal is that we can reuse serde info as how we reuse column 
descriptors. (HIVE-2246)

Currently, we store the metadata for partitions as PARTITIONS (N partitions) -> 
SDS (N locations) -> SERDES (N entries). However,  all the SERDES for the 
partitions in a table are the same if we don't explicitly specify it. That is, 
each storage descriptor has a associated and exclusive serde info, but the 
partitions' serde infos are mostly just the same as the table's. By reusing the 
serde info, we can save some database storage and enhance the query performance 
from HMS to the backend database.

For backward compatibility, we also need to introduce a config for this feature 
because there will be issues if HMS old instance and HMS new instance with this 
feature are running together. With this feature, we will need to check if 
others reference the serdes before deleting it, but the old instance will just 
delete it.

The other thing we need to take care of is custom serdes. If a partition's 
serde is modified, we need to create a new record in SERDES so that we don't 
interfere other partitions.

  was:
We can reuse serde info as how we reuse column descriptors. (HIVE-2246)

Currently, each storage descriptor has a associated and exclusive serde info, 
but the partitions' serde infos are mostly just the same as the table's. By 
reusing the serde info, we can saving some database storage and possibly 
enhance the query performance from HMS to the backend database.

 


> Deduplicate SerDe Info
> ----------------------
>
>                 Key: HIVE-25779
>                 URL: https://issues.apache.org/jira/browse/HIVE-25779
>             Project: Hive
>          Issue Type: New Feature
>          Components: Standalone Metastore
>            Reporter: Yu-Wen Lai
>            Assignee: Yu-Wen Lai
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The proposal is that we can reuse serde info as how we reuse column 
> descriptors. (HIVE-2246)
> Currently, we store the metadata for partitions as PARTITIONS (N partitions) 
> -> SDS (N locations) -> SERDES (N entries). However,  all the SERDES for the 
> partitions in a table are the same if we don't explicitly specify it. That 
> is, each storage descriptor has a associated and exclusive serde info, but 
> the partitions' serde infos are mostly just the same as the table's. By 
> reusing the serde info, we can save some database storage and enhance the 
> query performance from HMS to the backend database.
> For backward compatibility, we also need to introduce a config for this 
> feature because there will be issues if HMS old instance and HMS new instance 
> with this feature are running together. With this feature, we will need to 
> check if others reference the serdes before deleting it, but the old instance 
> will just delete it.
> The other thing we need to take care of is custom serdes. If a partition's 
> serde is modified, we need to create a new record in SERDES so that we don't 
> interfere other partitions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to