[ 
https://issues.apache.org/jira/browse/HUDI-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kazdy updated HUDI-5848:
------------------------
    Description: 
Starting from 0.13 precombine field is optional in Spark.
Before this was only available in Flink, but in Flink COMBINE_BEFORE_UPSERT is 
set to false by default and if no precombine field is provided upserts can be 
done without any configuration changes.

In Hudi + Spark, on the other hand, users must explicitly set 
COMBINE_BEFORE_UPSERT option to false first in order to do upserts in absence 
of precombine field.

As a Hudi user, if no precombine field is provided I would like Hudi to 
automatically set the appropriate option of COMBINE_BEFORE_UPSERT, to provide a 
seamless experience.

I assume precombine field can be optional only if the table type is CoW, for 
MoR precombine is required for it to work properly so it's ok to throw an error 
in absence of precombine when operation is upsert.
Therefore this should work only for CoW.

  was:
Starting from 0.13 precombine field is optional in Spark.
Before this was only available in Flink, but in Flink COMBINE_BEFORE_UPSERT is 
set to false by default and if no precombine field is provided upserts can be 
done without any configuration changes.

In Hudi + Spark, on the other hand, users must explicitly set 
COMBINE_BEFORE_UPSERT option to false first in order to do upserts in absence 
of precombine field.

As a Hudi user, if no precombine field is provided I would like Hudi to 
automatically set the appropriate option of COMBINE_BEFORE_UPSERT, to provide a 
seamless experience.

I assume precombine field can be optional only if the table type is CoW, for 
MoR precombine is required for it to work properly.
Therefore this should work only for CoW.


> If no precombine field is provided make COMBINE_BEFORE_UPSERT=false 
> automatically
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-5848
>                 URL: https://issues.apache.org/jira/browse/HUDI-5848
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: dev-experience
>            Reporter: kazdy
>            Assignee: kazdy
>            Priority: Minor
>             Fix For: 0.13.1
>
>
> Starting from 0.13 precombine field is optional in Spark.
> Before this was only available in Flink, but in Flink COMBINE_BEFORE_UPSERT 
> is set to false by default and if no precombine field is provided upserts can 
> be done without any configuration changes.
> In Hudi + Spark, on the other hand, users must explicitly set 
> COMBINE_BEFORE_UPSERT option to false first in order to do upserts in absence 
> of precombine field.
> As a Hudi user, if no precombine field is provided I would like Hudi to 
> automatically set the appropriate option of COMBINE_BEFORE_UPSERT, to provide 
> a seamless experience.
> I assume precombine field can be optional only if the table type is CoW, for 
> MoR precombine is required for it to work properly so it's ok to throw an 
> error in absence of precombine when operation is upsert.
> Therefore this should work only for CoW.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to