[
https://issues.apache.org/jira/browse/HUDI-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912113#comment-17912113
]
Davis Zhang edited comment on HUDI-8818 at 1/10/25 9:26 PM:
------------------------------------------------------------
I could not reproduce the issue. It is most likely that in the 2 sessions that
users are not making sure it is using the same database name.
First create table using spark 3.4 and hudi 0.14.0 and insert values in the
"default" database.(I also repeated with hudi 0.15.0 + spark 3.5 for this step,
no difference)
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.4.4-bin-hadoop3
*➜* *~* spark-sql --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory'
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.836 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.01 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22 (
> id INT,
> name STRING,
> price DOUBLE,
> ts BIGINT
> ) USING hudi
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22/'
> PARTITIONED BY (name)
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> );
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Properties file
[file:/etc/hudi/conf/hudi-defaults.conf|file:///etc/hudi/conf/hudi-defaults.conf]
not found. Ignoring to load props file
25/01/10 12:36:51 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 0.231 seconds
spark-sql (default)>
> – Insert records with ts 100
> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 10.0 as price, 100 as ts
> UNION ALL
> SELECT 2, 'B', 20.0, 100;
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:53 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/01/10 12:36:56 WARN HoodieSparkSqlWriterInternal: Closing write client
Time taken: 4.818 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
idl
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 110.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 220.0
100 B
Time taken: 0.553 seconds, Fetched 2 row(s)
spark-sql (default)>
```
then using spark 3.5 I load the table created above using create table stmt,
please note I'm running under a database with the same name "default". If I do
it in other databases like "myDb" the create stmt will fail.
After that I tried select and insert, no issue found
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.5.4-bin-hadoop3
${*}{{*}SPARK_HOME{*}}{*}/bin/spark-sql \
--jars
/Users/zhanyeha/hudi-oss/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog --conf
spark.driver.extraJavaOptions='-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005'{*};{*}
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22
> USING hudi
> TBLPROPERTIES (
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> )
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22';
25/01/10 12:37:31 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.073 seconds
spark-sql (default)> select * from hudi_table_mor_single_partition22;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 1.038 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 99 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 99;
25/01/10 12:38:39 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
25/01/10 12:38:40 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN BaseHoodieCompactionPlanGenerator: No operations are
retrieved for
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
for table
[file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 3.059 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 0.378 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 101 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 101;
Time taken: 1.087 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123901621 20250110123901621_0_3 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0 1 30.0 101 A
20250110123901621 20250110123901621_1_4 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0 2 40.0 101 B
Time taken: 0.157 seconds, Fetched 2 row(s)
spark-sql (default)>
```
was (Author: JIRAUSER305408):
I could not reproduce the issue. It is most likely that in the 2 sessions that
users are not making sure it is using the same database name.
First create table using spark 3.4 and hudi 0.14.0 and insert values in the
"default" database.
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.4.4-bin-hadoop3
*➜* *~* spark-sql --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory'
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.836 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.01 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22 (
> id INT,
> name STRING,
> price DOUBLE,
> ts BIGINT
> ) USING hudi
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22/'
> PARTITIONED BY (name)
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> );
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Properties file
[file:/etc/hudi/conf/hudi-defaults.conf|file:///etc/hudi/conf/hudi-defaults.conf]
not found. Ignoring to load props file
25/01/10 12:36:51 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 0.231 seconds
spark-sql (default)>
> – Insert records with ts 100
> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 10.0 as price, 100 as ts
> UNION ALL
> SELECT 2, 'B', 20.0, 100;
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:53 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/01/10 12:36:56 WARN HoodieSparkSqlWriterInternal: Closing write client
Time taken: 4.818 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
idl
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 110.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 220.0
100 B
Time taken: 0.553 seconds, Fetched 2 row(s)
spark-sql (default)>
```
then using spark 3.5 I load the table created above using create table stmt,
please note I'm running under a database with the same name "default". If I do
it in other databases like "myDb" the create stmt will fail.
After that I tried select and insert, no issue found
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.5.4-bin-hadoop3
${*}{{*}SPARK_HOME{*}}{*}/bin/spark-sql \
--jars
/Users/zhanyeha/hudi-oss/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog --conf
spark.driver.extraJavaOptions='-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005'{*};{*}
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22
> USING hudi
> TBLPROPERTIES (
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> )
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22';
25/01/10 12:37:31 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.073 seconds
spark-sql (default)> select * from hudi_table_mor_single_partition22;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 1.038 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 99 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 99;
25/01/10 12:38:39 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
25/01/10 12:38:40 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN BaseHoodieCompactionPlanGenerator: No operations are
retrieved for
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
for table
[file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 3.059 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 0.378 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 101 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 101;
Time taken: 1.087 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123901621 20250110123901621_0_3 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0 1 30.0 101 A
20250110123901621 20250110123901621_1_4 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0 2 40.0 101 B
Time taken: 0.157 seconds, Fetched 2 row(s)
spark-sql (default)>
```
> Hudi 1.0 cannot use SQL to write older versioned Hudi table
> -----------------------------------------------------------
>
> Key: HUDI-8818
> URL: https://issues.apache.org/jira/browse/HUDI-8818
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Shawn Chang
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> When using Hudi 1.0 + Spark SQL to write a table created by Hudi 0.14 + Spark
> 3.5.0 DF, we noticed that the INSERT query would fail to database config
> conflict
> {code:java}
> Config conflict(key current value existing value):
> hoodie.database.name: yxchang_nolf
> org.apache.hudi.exception.HoodieException: Config conflict(key current
> value existing value):
> hoodie.database.name: yxchang_nolf
> at
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:256)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:245)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:190)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)