[
https://issues.apache.org/jira/browse/HUDI-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lin Liu updated HUDI-8856:
--------------------------
Description:
How to reproduce:
1. Use the CDC query example in the quick start:
[https://hudi.apache.org/docs/0.15.0/quick-start-guide]
2. Use the following setting to create archived commits, and try more updates
queries:
{code:java}
val opts: Map[String, String] = Map(
"hoodie.archive.automatic" -> "true",
"hoodie.keep.max.commits" -> "3",
"hoodie.keep.min.commits" -> "2",
"hoodie.clean.automatic" -> "clean",
"hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
"hoodie.clean.trigger.max.commits" -> "1",
"hoodie.clean.commits.retained" -> "3")
val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare",
col("fare") * 2)
updatesDf.write.format("hudi").
option("hoodie.datasource.write.operation", "upsert").
option("hoodie.datasource.write.partitionpath.field", "city").
option("hoodie.table.cdc.enabled", "true").
option("hoodie.table.name", tableName).
options(opts).
mode(Append).
save(basePath)
{code}
Meanwhile, the incremental query neither produce the correct result.
was:
How to reproduce:
1. Use the CDC query example in the quick start:
https://hudi.apache.org/docs/0.15.0/quick-start-guide
2. Use the following setting to create archived commits, and try more updates
queries:
{code:java}
val opts: Map[String, String] = Map(
"hoodie.archive.automatic" -> "true",
"hoodie.keep.max.commits" -> "3",
"hoodie.keep.min.commits" -> "2",
"hoodie.clean.automatic" -> "clean",
"hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
"hoodie.clean.trigger.max.commits" -> "1",
"hoodie.clean.commits.retained" -> "3")
val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare",
col("fare") * 2)
updatesDf.write.format("hudi").
option("hoodie.datasource.write.operation", "upsert").
option("hoodie.datasource.write.partitionpath.field", "city").
option("hoodie.table.cdc.enabled", "true").
option("hoodie.table.name", tableName).
options(opts).
mode(Append).
save(basePath)
{code}
> CDC queries with start instant archived (in history) produce incorrect result
> -----------------------------------------------------------------------------
>
> Key: HUDI-8856
> URL: https://issues.apache.org/jira/browse/HUDI-8856
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Lin Liu
> Assignee: Lin Liu
> Priority: Blocker
> Fix For: 1.1.0
>
>
> How to reproduce:
> 1. Use the CDC query example in the quick start:
> [https://hudi.apache.org/docs/0.15.0/quick-start-guide]
> 2. Use the following setting to create archived commits, and try more updates
> queries:
> {code:java}
> val opts: Map[String, String] = Map(
> "hoodie.archive.automatic" -> "true",
> "hoodie.keep.max.commits" -> "3",
> "hoodie.keep.min.commits" -> "2",
> "hoodie.clean.automatic" -> "clean",
> "hoodie.clean.policy" -> "KEEP_LATEST_COMMITS",
> "hoodie.clean.trigger.max.commits" -> "1",
> "hoodie.clean.commits.retained" -> "3")
> val updatesDf = spark.read.format("hudi").load(basePath).withColumn("fare",
> col("fare") * 2)
> updatesDf.write.format("hudi").
> option("hoodie.datasource.write.operation", "upsert").
> option("hoodie.datasource.write.partitionpath.field", "city").
> option("hoodie.table.cdc.enabled", "true").
> option("hoodie.table.name", tableName).
> options(opts).
> mode(Append).
> save(basePath)
> {code}
>
> Meanwhile, the incremental query neither produce the correct result.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)