[jira] [Updated] (HUDI-8856) CDC queries with start instant archived (in history) produce incorrect result

Lin Liu (Jira) Fri, 10 Jan 2025 13:35:46 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lin Liu updated HUDI-8856:
--------------------------
    Description: 
How to reproduce:

1. Use the CDC query example in the quick start: 
[https://hudi.apache.org/docs/0.15.0/quick-start-guide]

2. Use the following setting to create archived commits, and try more updates 
queries:
{code:java}
val opts: Map[String, String] = Map(
  "hoodie.archive.automatic" -> "true",
  "hoodie.keep.max.commits" -> "3",
  "hoodie.keep.min.commits" -> "2",
  "hoodie.clean.automatic" -> "clean",
  "hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
  "hoodie.clean.trigger.max.commits" -> "1",
  "hoodie.clean.commits.retained" -> "3") 
val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare", 
col("fare") * 2)
updatesDf.write.format("hudi").
  option("hoodie.datasource.write.operation", "upsert").
  option("hoodie.datasource.write.partitionpath.field", "city").
  option("hoodie.table.cdc.enabled", "true").
  option("hoodie.table.name", tableName).
  options(opts).
  mode(Append).
  save(basePath)

 {code}
 

Meanwhile, the incremental query neither produce the correct result.

  was:
How to reproduce:

1. Use the CDC query example in the quick start: 
https://hudi.apache.org/docs/0.15.0/quick-start-guide

2. Use the following setting to create archived commits, and try more updates 
queries:
{code:java}
val opts: Map[String, String] = Map(
  "hoodie.archive.automatic" -> "true",
  "hoodie.keep.max.commits" -> "3",
  "hoodie.keep.min.commits" -> "2",
  "hoodie.clean.automatic" -> "clean",
  "hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
  "hoodie.clean.trigger.max.commits" -> "1",
  "hoodie.clean.commits.retained" -> "3") 
val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare", 
col("fare") * 2)
updatesDf.write.format("hudi").
  option("hoodie.datasource.write.operation", "upsert").
  option("hoodie.datasource.write.partitionpath.field", "city").
  option("hoodie.table.cdc.enabled", "true").
  option("hoodie.table.name", tableName).
  options(opts).
  mode(Append).
  save(basePath)

 {code}
 


> CDC queries with start instant archived (in history) produce incorrect result
> -----------------------------------------------------------------------------
>
>                 Key: HUDI-8856
>                 URL: https://issues.apache.org/jira/browse/HUDI-8856
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Lin Liu
>            Assignee: Lin Liu
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> How to reproduce:
> 1. Use the CDC query example in the quick start: 
> [https://hudi.apache.org/docs/0.15.0/quick-start-guide]
> 2. Use the following setting to create archived commits, and try more updates 
> queries:
> {code:java}
> val opts: Map[String, String] = Map(
>   "hoodie.archive.automatic" -> "true",
>   "hoodie.keep.max.commits" -> "3",
>   "hoodie.keep.min.commits" -> "2",
>   "hoodie.clean.automatic" -> "clean",
>   "hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
>   "hoodie.clean.trigger.max.commits" -> "1",
>   "hoodie.clean.commits.retained" -> "3") 
> val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare", 
> col("fare") * 2)
> updatesDf.write.format("hudi").
>   option("hoodie.datasource.write.operation", "upsert").
>   option("hoodie.datasource.write.partitionpath.field", "city").
>   option("hoodie.table.cdc.enabled", "true").
>   option("hoodie.table.name", tableName).
>   options(opts).
>   mode(Append).
>   save(basePath)
>  {code}
>  
> Meanwhile, the incremental query neither produce the correct result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-8856) CDC queries with start instant archived (in history) produce incorrect result

Reply via email to