[ 
https://issues.apache.org/jira/browse/HUDI-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454891#comment-17454891
 ] 

Vinoth Chandar commented on HUDI-1449:
--------------------------------------

this is already supported

> Support for _hoodie_record_key as a virtual column 
> ---------------------------------------------------
>
>                 Key: HUDI-1449
>                 URL: https://issues.apache.org/jira/browse/HUDI-1449
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: Nishith Agarwal
>            Assignee: Abhishek Modi
>            Priority: Major
>
> Context:
> Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
> file. In our production systems at Uber however, _hoodie_record_key
> contains data that can be found in a different column (or set of columns).
> This means that we are storing duplicated data.
> Proposal:
> In the interest of improving storage efficiency, we could add confs /
> abstract classes that can construct the _hoodie_record_key given other
> columns. That way we do not have to store duplicated data on DFS.
>  
> RFC -> 
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+21+%3A+Allow+HoodieRecordKey+to+be+Virtual



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to