[
https://issues.apache.org/jira/browse/HUDI-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912113#comment-17912113
]
Davis Zhang commented on HUDI-8818:
-----------------------------------
I could not reproduce the issue
First create table using spark 3.4 and hudi 0.14.0 and insert values
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.4.4-bin-hadoop3
*➜* *~* spark-sql --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory'
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.836 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.01 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22 (
> id INT,
> name STRING,
> price DOUBLE,
> ts BIGINT
> ) USING hudi
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22/'
> PARTITIONED BY (name)
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> );
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/01/10 12:36:51 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22
Time taken: 0.231 seconds
spark-sql (default)>
> -- Insert records with ts 100
> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 10.0 as price, 100 as ts
> UNION ALL
> SELECT 2, 'B', 20.0, 100;
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22
25/01/10 12:36:53 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/01/10 12:36:56 WARN HoodieSparkSqlWriterInternal: Closing write client
Time taken: 4.818 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
idl
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 110.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 220.0
100 B
Time taken: 0.553 seconds, Fetched 2 row(s)
spark-sql (default)>
```
then using spark 3.5 to read from it and insert new records updating the
existing ones. no issues
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.5.4-bin-hadoop3
${*}{{*}SPARK_HOME{*}}{*}/bin/spark-sql \
--jars
/Users/zhanyeha/hudi-oss/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog --conf
spark.driver.extraJavaOptions='-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005'*;*
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22
> USING hudi
> TBLPROPERTIES (
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> )
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22';
25/01/10 12:37:31 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.073 seconds
spark-sql (default)> select * from hudi_table_mor_single_partition22;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 1.038 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 99 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 99;
25/01/10 12:38:39 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
25/01/10 12:38:40 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN BaseHoodieCompactionPlanGenerator: No operations are
retrieved for
file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22 for table
file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22
Time taken: 3.059 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 0.378 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 101 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 101;
Time taken: 1.087 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123901621 20250110123901621_0_3 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0 1 30.0 101 A
20250110123901621 20250110123901621_1_4 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0 2 40.0 101 B
Time taken: 0.157 seconds, Fetched 2 row(s)
spark-sql (default)>
```
> Hudi 1.0 cannot use SQL to write older versioned Hudi table
> -----------------------------------------------------------
>
> Key: HUDI-8818
> URL: https://issues.apache.org/jira/browse/HUDI-8818
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Shawn Chang
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
>
> When using Hudi 1.0 + Spark SQL to write a table created by Hudi 0.14 + Spark
> 3.5.0 DF, we noticed that the INSERT query would fail to database config
> conflict
> {code:java}
> Config conflict(key current value existing value):
> hoodie.database.name: yxchang_nolf
> org.apache.hudi.exception.HoodieException: Config conflict(key current
> value existing value):
> hoodie.database.name: yxchang_nolf
> at
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:256)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:245)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:190)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)