[
https://issues.apache.org/jira/browse/HUDI-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912113#comment-17912113
]
Davis Zhang edited comment on HUDI-8818 at 1/10/25 10:55 PM:
-------------------------------------------------------------
I can repro the issue and it should be user not having the right setup on their
side. If the setup is fixed the issue is gone
I tried writer 1 using hudi 0.14 + spark 3.4 / hudi 0.15 + spark 3.5 to create
table + insert value
then writer 2 using hudi 1.0 + spark 3.5 to load table from disk and
select/insert, no issue found.
I'm able to repro the issue if the writer 2 tries to run create table stmt in a
different database.
*Steps I followed*
First create table using spark 3.4 and hudi 0.14.0 and insert values in the
"default" database.(I also repeated with hudi 0.15.0 + spark 3.5 for this step,
no difference).
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.4.4-bin-hadoop3
*➜* *~* spark-sql --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory'
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.836 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.01 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22 (
> id INT,
> name STRING,
> price DOUBLE,
> ts BIGINT
> ) USING hudi
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22/'
> PARTITIONED BY (name)
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> );
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Properties file
[file:/etc/hudi/conf/hudi-defaults.conf|file:///etc/hudi/conf/hudi-defaults.conf]
not found. Ignoring to load props file
25/01/10 12:36:51 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 0.231 seconds
spark-sql (default)>
> – Insert records with ts 100
> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 10.0 as price, 100 as ts
> UNION ALL
> SELECT 2, 'B', 20.0, 100;
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:53 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/01/10 12:36:56 WARN HoodieSparkSqlWriterInternal: Closing write client
Time taken: 4.818 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
idl
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 110.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 220.0
100 B
Time taken: 0.553 seconds, Fetched 2 row(s)
spark-sql (default)>
```
then using spark 3.5 I load the table created above using create table stmt,
please note I'm running under a database with the same name "default". If I do
it in other databases like "myDb" the create stmt will fail.
After that I tried select and insert, no issue found
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.5.4-bin-hadoop3
${*}{{*}SPARK_HOME{*}}{*}/bin/spark-sql \
--jars
/Users/zhanyeha/hudi-oss/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog --conf
spark.driver.extraJavaOptions='-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005'{*};{*}
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22
> USING hudi
> TBLPROPERTIES (
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> )
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22';
25/01/10 12:37:31 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.073 seconds
spark-sql (default)> select * from hudi_table_mor_single_partition22;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 1.038 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 99 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 99;
25/01/10 12:38:39 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
25/01/10 12:38:40 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN BaseHoodieCompactionPlanGenerator: No operations are
retrieved for
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
for table
[file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 3.059 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 0.378 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 101 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 101;
Time taken: 1.087 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123901621 20250110123901621_0_3 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0 1 30.0 101 A
20250110123901621 20250110123901621_1_4 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0 2 40.0 101 B
Time taken: 0.157 seconds, Fetched 2 row(s)
spark-sql (default)>
```
was (Author: JIRAUSER305408):
I could not reproduce the issue. It is most likely that in the 2 sessions that
users are not making sure it is using the same database name.
I tried writer 1 using hudi 0.14 + spark 3.4 / hudi 0.15 + spark 3.5 to create
table + insert value
then writer 2 using hudi 1.0 + spark 3.5 to load table from disk and
select/insert, no issue found.
I'm able to repro the issue if the writer 2 tries to run create table stmt in a
different database.
*Steps I followed*
First create table using spark 3.4 and hudi 0.14.0 and insert values in the
"default" database.(I also repeated with hudi 0.15.0 + spark 3.5 for this step,
no difference).
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.4.4-bin-hadoop3
*➜* *~* spark-sql --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory'
spark-sql (default)> set
hoodie.datasource.write.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.datasource.write.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.836 seconds, Fetched 1 row(s)
spark-sql (default)> set
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload;
hoodie.compaction.payload.class
org.apache.hudi.common.model.DefaultHoodieRecordPayload
Time taken: 0.01 seconds, Fetched 1 row(s)
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22 (
> id INT,
> name STRING,
> price DOUBLE,
> ts BIGINT
> ) USING hudi
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22/'
> PARTITIONED BY (name)
> TBLPROPERTIES (
> type = 'mor',
> primaryKey = 'id',
> preCombineField = 'ts',
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> );
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
please set it as the dir of hudi-defaults.conf
25/01/10 12:36:51 WARN DFSPropertiesConfiguration: Properties file
[file:/etc/hudi/conf/hudi-defaults.conf|file:///etc/hudi/conf/hudi-defaults.conf]
not found. Ignoring to load props file
25/01/10 12:36:51 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 0.231 seconds
spark-sql (default)>
> – Insert records with ts 100
> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 10.0 as price, 100 as ts
> UNION ALL
> SELECT 2, 'B', 20.0, 100;
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:52 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
25/01/10 12:36:53 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/01/10 12:36:56 WARN HoodieSparkSqlWriterInternal: Closing write client
Time taken: 4.818 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
idl
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 110.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 220.0
100 B
Time taken: 0.553 seconds, Fetched 2 row(s)
spark-sql (default)>
```
then using spark 3.5 I load the table created above using create table stmt,
please note I'm running under a database with the same name "default". If I do
it in other databases like "myDb" the create stmt will fail.
After that I tried select and insert, no issue found
```
*➜* *~* echo $SPARK_HOME
/Users/zhanyeha/spark-3.5.4-bin-hadoop3
${*}{{*}SPARK_HOME{*}}{*}/bin/spark-sql \
--jars
/Users/zhanyeha/hudi-oss/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.sql.catalogImplementation=in-memory' \
--conf 'spark.executor.heartbeat.maxFailures=999999999' \
--conf spark.sql.defaultCatalog=spark_catalog --conf
spark.driver.extraJavaOptions='-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005'{*};{*}
spark-sql (default)>
> CREATE TABLE hudi_table_mor_single_partition22
> USING hudi
> TBLPROPERTIES (
>
hoodie.datasource.write.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload',
>
hoodie.compaction.payload.class='org.apache.hudi.common.model.DefaultHoodieRecordPayload'
> )
> LOCATION
'file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22';
25/01/10 12:37:31 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.initial.version will
be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.table.keygenerator.type
will be ignored.
25/01/10 12:37:31 WARN HoodieTableConfig: Table version SIX is lower than or
equal to config's first version EIGHT. Config hoodie.record.merge.mode will be
ignored.
Time taken: 0.073 seconds
spark-sql (default)> select * from hudi_table_mor_single_partition22;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 1.038 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 99 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 99;
25/01/10 12:38:39 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
25/01/10 12:38:40 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not yet
fully supported by the writer. Please expect some unexpected behavior, until
its fully implemented.
25/01/10 12:38:40 WARN BaseHoodieCompactionPlanGenerator: No operations are
retrieved for
[file:/tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22|file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
for table
[file:///tmp/lakes/observed-default/dd/hudi_table_mor_single_partition22]
Time taken: 3.059 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123652107 20250110123652107_0_0 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0_0-23-27_20250110123652107.parquet 1 10.0
100 A
20250110123652107 20250110123652107_1_0 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0_1-23-28_20250110123652107.parquet 2 20.0
100 B
Time taken: 0.378 seconds, Fetched 2 row(s)
spark-sql (default)> INSERT INTO hudi_table_mor_single_partition22
> SELECT 1 as id, 'A' as name, 30.0 as price, 101 as ts
> UNION ALL
> SELECT 2, 'B', 40.0, 101;
Time taken: 1.087 seconds
spark-sql (default)> select * FROM hudi_table_mor_single_partition22 ORDER BY
id;
20250110123901621 20250110123901621_0_3 1 name=A
75f8fcb5-a6e9-400b-afd4-d31b64e82ce5-0 1 30.0 101 A
20250110123901621 20250110123901621_1_4 2 name=B
3386d4dc-f107-4403-a72e-5c145fa1ff7e-0 2 40.0 101 B
Time taken: 0.157 seconds, Fetched 2 row(s)
spark-sql (default)>
```
> Hudi 1.0 cannot use SQL to write older versioned Hudi table
> -----------------------------------------------------------
>
> Key: HUDI-8818
> URL: https://issues.apache.org/jira/browse/HUDI-8818
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Shawn Chang
> Assignee: Davis Zhang
> Priority: Blocker
> Fix For: 1.0.1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> When using Hudi 1.0 + Spark SQL to write a table created by Hudi 0.14 + Spark
> 3.5.0 DF, we noticed that the INSERT query would fail to database config
> conflict
> {code:java}
> Config conflict(key current value existing value):
> hoodie.database.name: yxchang_nolf
> org.apache.hudi.exception.HoodieException: Config conflict(key current
> value existing value):
> hoodie.database.name: yxchang_nolf
> at
> org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:256)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:245)
> at
> org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:190)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)