[ https://issues.apache.org/jira/browse/HUDI-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-5272: --------------------------------- Labels: pull-request-available (was: ) > Align with Flink to support no_precombine in spark > -------------------------------------------------- > > Key: HUDI-5272 > URL: https://issues.apache.org/jira/browse/HUDI-5272 > Project: Apache Hudi > Issue Type: Improvement > Reporter: kazdy > Assignee: kazdy > Priority: Major > Labels: pull-request-available > > Flink supports {{public static final String NO_PRE_COMBINE = > "no_precombine";}} (although not documented) for inserts and updates. > This was Introduced by [#3874|https://github.com/apache/hudi/pull/3874]. > https://issues.apache.org/jira/browse/HUDI-2633 > {{When the precombine field is not specified, we use the proctime semantics, > that means, the records come later are more fresh}} > There's argument against it, because for updates records cannot be > deduplicated properly. But at the same time Hudi allows us to use non-strict > insert mode that breaks PK uniqueness. > Users can make informed decision and handle duplicates on their own or bring > in their own precombine logic with window functions etc before triggering > hudi write. -- This message was sent by Atlassian Jira (v8.20.10#820010)