[ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-2681: -------------------------------------- Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Make hoodie record_key and preCombine_key optional > -------------------------------------------------- > > Key: HUDI-2681 > URL: https://issues.apache.org/jira/browse/HUDI-2681 > Project: Apache Hudi > Issue Type: New Feature > Components: Common Core, spark-sql, writer-core > Reporter: Vinoth Govindarajan > Assignee: Lokesh Jain > Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > At present, Hudi needs an record key and preCombine key to create an Hudi > datasets, which puts an restriction on the kinds of datasets we can create > using Hudi. > > In order to increase the adoption of Hudi file format across all kinds of > derived datasets, similar to Parquet/ORC, we need to offer flexibility to > users. I understand that record key is used for upsert primitive and we need > preCombine key to break the tie and deduplicate, but there are event data and > other datasets without any primary key (append only datasets), which can > benefit from Hudi since Hudi ecosystem offers other features such as snapshot > isolation, indexes, clustering, delta streamer etc., which could be applied > to any datasets without record key. > > The idea of this proposal is to make both the record key and preCombine key > optional to allow variety of new use cases on top of Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)