+1. This should be good to have as an option. If everybody agrees, please go 
ahead with RFC and we can discuss details there.
Balaji.V    On Tuesday, August 18, 2020, 04:37:18 PM PDT, Abhishek Modi 
<[email protected]> wrote:  
 
 Hi everyone!

I was hoping to discuss adding support for making `_hoodie_record_key` a
virtual column :)

Context:
Currently, _hoodie_record_key is written to DFS, as a column in the Parquet
file. In our production systems at Uber however, _hoodie_record_key
contains data that can be found in a different column (or set of columns).
This means that we are storing duplicated data.

Proposal:
In the interest of improving storage efficiency, we could add confs /
abstract classes that can construct the _hoodie_record_key given other
columns. That way we do not have to store duplicated data on DFS.

Any thoughts on this?

Best,
Modi
  

Reply via email to