hudi-bot opened a new issue, #17199:
URL: https://github.com/apache/hudi/issues/17199
Exception: [^hfile.error]
Summary:
# Write an insert and update to mor table with spark 3.1 hudi 0.14.1
# write an update with spark 3.5 with current master
`adda6950e0aaa7353add88ee2fc0499d7135ee33` using write table version 6
# Read table with spark 3.1 hudi 0.14.1 and get exception
The hoodie.properties still says table version is 6
Here is my runscript:
{code:java}
set_spark 3.1
hudi_spark_shell -p -v 0.14.1
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.common.table.HoodieTableConfig._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
import org.apache.hudi.common.model.HoodieRecord
import spark.implicits._
val tableName = "trips_table"
val basePath = "file:///tmp/trips_table"
val columns = Seq("ts","uuid","rider","driver","fare","city")
val data =
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
,"san_francisco"),
(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
,"san_francisco"),
(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"
),
(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));
var inserts = spark.createDataFrame(data).toDF(columns:_*)
inserts.write.format("hudi").
option("hoodie.datasource.write.partitionpath.field", "city").
option("hoodie.table.name", tableName).
option("hoodie.metadata.index.column.stats.enable", "true").
option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
mode(Overwrite).
save(basePath)
val updatesDf = spark.read.format("hudi").load(basePath).filter($"rider" ===
"rider-D").withColumn("fare", col("fare") * 10)
updatesDf.write.format("hudi").
option("hoodie.datasource.write.operation", "upsert").
option("hoodie.datasource.write.partitionpath.field", "city").
option("hoodie.table.name", tableName).
option("hoodie.metadata.index.column.stats.enable", "true").
option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
mode(Append).
save(basePath)
//exit
set_spark 3.5
hudi_spark_shell -j
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.common.table.HoodieTableConfig._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
import org.apache.hudi.common.model.HoodieRecord
import spark.implicits._
val tableName = "trips_table"
val basePath = "file:///tmp/trips_table"
val updatesDf = spark.read.format("hudi").load(basePath).filter($"rider" ===
"rider-D").withColumn("fare", col("fare") * 10)
updatesDf.write.format("hudi").option("hoodie.datasource.write.operation",
"upsert").option("hoodie.datasource.write.partitionpath.field",
"city").option("hoodie.table.name",
tableName).option("hoodie.metadata.index.column.stats.enable",
"true").option("hoodie.write.table.version",
"6").option("hoodie.datasource.write.table.type",
"MERGE_ON_READ").mode(Append).save(basePath)
//exit
set_spark 3.1
hudi_spark_shell -p -v 0.14.1
spark.read.format("hudi").option("hoodie.metadata.enable",
"true").option("hoodie.enable.data.skipping",
"true").option("hoodie.metadata.index.column.stats.enable",
"true").load("/tmp/trips_table").filter("fare > 100").show(100,false) {code}
Command for running 0.14.1 with spark 3.1 using mvn package:
{code:java}
/Users/jon/Documents/sparkroot/spark-3.1.3-bin-hadoop3.2/bin/spark-shell
--packages org.apache.hudi:hudi-spark3.1-bundle_2.12:0.14.1 --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' --conf
'spark.sql.catalogImplementation=in-memory' {code}
Command for running with current master on spark 3.5
{code:java}
/Users/jon/Documents/sparkroot/spark-3.5.2-bin-hadoop3/bin/spark-shell
--jars
/Users/jon/git/hudi-versions/current/spark3.5/hudi/packaging/hudi-spark-bundle/target/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar' --conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
--conf 'spark.sql.catalogImplementation=in-memory' {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-9791
- Type: Bug
- Fix version(s):
- 1.1.0
- Attachment(s):
- 05/Sep/25
18:44;jonvex;hfile.error;https://issues.apache.org/jira/secure/attachment/13078264/hfile.error
---
## Comments
05/Sep/25 21:23;linliu;{code:java}
➜ ~ spark-3.2Welcome to ____ __ / __/__ ___ _____/
/__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.1
/_/
Using Scala version 2.12.15, OpenJDK 64-Bit Server VM,
1.8.0_392-internalBranch HEADCompiled by user hgao on
2022-01-20T19:26:14ZRevision 4f25b3f71238a00508a356591553f2dfa89f8290Url
https://github.com/apache/sparkType --help for more information.➜ ~ export
SPARK_VERSION=3.2➜ ~ spark-3.2Welcome to ____ __ / __/__
___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\
version 3.2.1 /_/
Using Scala version 2.12.15, OpenJDK 64-Bit Server VM,
1.8.0_392-internalBranch HEADCompiled by user hgao on
2022-01-20T19:26:14ZRevision 4f25b3f71238a00508a356591553f2dfa89f8290Url
https://github.com/apache/sparkType --help for more information.➜ ~
spark-shell --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'::
loading settings :: url =
jar:file:/Users/linliu/libraries/spark-3.2.1-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlIvy
Default Cache set to: /Users/linliu/.ivy2/cacheThe jars for the packages
stored in: /Users/linliu/.ivy2/jarsorg.apache.hudi#hudi-spark3.2-bundle_2.12
added as a dependency:: resolving dependencies ::
org.apache.spark#spark-submit-parent-4b96494a-df95-4a52-a2ac-322d2d02f5b2;1.0
confs: [default] found org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1
in local-m2-cache:: resolution report :: resolve 178ms :: artifacts dl 3ms
:: modules in use: org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1 from
local-m2-cache in [default]
--------------------------------------------------------------------- |
| modules || artifacts | | conf
| number| search|dwnlded|evicted|| number|dwnlded|
--------------------------------------------------------------------- |
default | 1 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------::
retrieving ::
org.apache.spark#spark-submit-parent-4b96494a-df95-4a52-a2ac-322d2d02f5b2
confs: [default] 0 artifacts copied, 1 already retrieved
(0kB/10ms)25/09/05 14:17:16 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using buil
tin-java classes where applicableUsing Spark's default log4j profile:
org/apache/spark/log4j-defaults.propertiesSetting default log level to
"WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).25/09/05 14:17:24 WARN Utils: Service 'SparkUI' could not
bind on port 4040. Attempting port 4041.Spark context Web UI available at
http://lins-mbp.attlocal.net:4041Spark context available as 'sc' (master =
local[*], app id = local-1757107044815).Spark session available as
'spark'.Welcome to ____ __ / __/__ ___ _____/ /__ _\
\/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.1 /_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java
1.8.0_392-internal)Type in expressions to have them evaluated.Type :help for
more information.
scala> import scala.collection.JavaConversions._import
scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._import
org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._import
org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._import
org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.common.table.HoodieTableConfig._import
org.apache.hudi.common.table.HoodieTableConfig._
scala> import org.apache.hudi.config.HoodieWriteConfig._import
org.apache.hudi.config.HoodieWriteConfig._
scala> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._import
org.apache.hudi.keygen.constant.KeyGeneratorOptions._
scala> import org.apache.hudi.common.model.HoodieRecordimport
org.apache.hudi.common.model.HoodieRecord
scala> import spark.implicits._import spark.implicits._
scala> val tableName = "trips_table"tableName: String = trips_table
scala> val basePath = "file:///tmp/trips_table"basePath: String =
file:///tmp/trips_table
scala> val columns = Seq("ts","uuid","rider","driver","fare","city")columns:
Seq[String] = List(ts, uuid, rider, driver, fare, city)
scala> import scala.collection.JavaConversions._import
scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._import
org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._import
org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._import
org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.common.table.HoodieTableConfig._import
org.apache.hudi.common.table.HoodieTableConfig._
scala> import org.apache.hudi.config.HoodieWriteConfig._import
org.apache.hudi.config.HoodieWriteConfig._
scala> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._import
org.apache.hudi.keygen.constant.KeyGeneratorOptions._
scala> import org.apache.hudi.common.model.HoodieRecordimport
org.apache.hudi.common.model.HoodieRecord
scala> import spark.implicits._import spark.implicits._
scala> val tableName = "trips_table"tableName: String = trips_table
scala> val basePath = "file:///tmp/trips_table"basePath: String =
file:///tmp/trips_table
scala> val columns = Seq("ts","uuid","rider","driver","fare","city")columns:
Seq[String] = List(ts, uuid, rider, driver, fare, city)
scala> val data = |
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
|
(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
,"san_francisco"), |
(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
,"san_francisco"), |
(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"
), |
(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));data:
Seq[(Long, String, String, String, Double, String)] =
List((1695159649087,334e26e9-8355-45cc-97c6-c31daf0df330,rider-A,driver-K,19.1,san_francisco),
(1695091554788,e96c4396-3fad-413a-a942-4cb36106d721,rider-C,driver-M,27.7,san_francisco),
(1695046462179,9909a8b1-2d15-4d3d-8ec9-efc48c536a00,rider-D,driver-L,33.9,san_francisco),
(1695516137016,e3cf430c-889d-4015-bc98-59bdce1e530c,rider-F,dri
ver-P,34.15,sao_paulo),
(1695115999911,c8abbe79-8d89-47ea-b4ce-4d224bae5bfa,rider-J,driver-T,17.85,chennai))
scala> var inserts = spark.createDataFrame(data).toDF(columns:_*)inserts:
org.apache.spark.sql.DataFrame = [ts: bigint, uuid: string ... 4 more fields]
scala> inserts.write.format("hudi"). |
option("hoodie.datasource.write.partitionpath.field", "city"). |
option("hoodie.table.name", tableName). |
option("hoodie.metadata.index.column.stats.enable", "true"). |
option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). |
mode(Overwrite). | save(basePath)25/09/05 14:18:44 WARN
HoodieSparkSqlWriterInternal: Choosing BULK_INSERT as the operation type since
auto record key generation is applicable25/09/05 14:18:44 WARN
HoodieSparkSqlWriterInternal: hoodie table at file:/tmp/trips_table already
exists. Deleting existing data & overwriting with new data.25/09/05 14:18:45
WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist25/09/05
14:18:45 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not
exist25/09/05 14:18:47 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording the
schema version 2.3.025/09/05 14:18:47 WARN ObjectStore:
setMetaStoreSchemaVersion called but recording version is disabled: version =
2.3.0, comment = Set by MetaStore [email protected]/09/05 14:18:48 WARN
AutoRecordKeyGenerationUtils$: Precombine field ts will be ignored with auto
record key generation enabled25/09/05 14:18:52 WARN MetricsConfig: Cannot
locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties# WARNING: Unable to
get Instrumentation. Dynamic Attach failed. You may add this JAR as -javaagent
manually, or supply -Djdk.attach.allowAttachSelf# WARNING: Unable to get
Instrumentation. Dynamic Attach failed. You may add this JAR as -javaagent
manually, or supply -Djdk.attach.allowAttachSelf# WARNING: Unable to attach
Serviceability Agent. Unable to attach even with module exceptions:
[org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apac
he.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.]# WARNING:
Unable to attach Serviceability Agent. Unable to attach even with module
exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]# WARNING: Unable to attach Serviceability Agent. Unable to attach even
with module exceptions:
[org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.]
scala> val updatesDf =
spark.read.format("hudi").load(basePath).filter($"rider" ===
"rider-D").withColumn("fare", col("fare") * 10)updatesDf:
org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string,
_hoodie_commit_seqno: string ... 9 more fields]
scala>
scala> updatesDf.write.format("hudi"). |
option("hoodie.datasource.write.operation", "upsert"). |
option("hoodie.datasource.write.partitionpath.field", "city"). |
option("hoodie.table.name", tableName). |
option("hoodie.metadata.index.column.stats.enable", "true"). |
option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). |
mode(Append). | save(basePath)25/09/05 14:19:04 WARN HoodieWriterUtils$:
Changing operation type to UPSERT PREPPED for pk less table upserts25/09/05
14:19:04 WARN AutoRecordKeyGenerationUtils$: Precombine field ts will be
ignored with auto record key generation enabled25/09/05 14:19:08 WARN
HoodieSparkSqlWriterInternal: Closing write client
scala> exit<console>:73: error: not found: value exit exit ^
scala> %➜ ~ spark-3.5Welcome to ____ __ / __/__ ___
_____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version
3.5.0 /_/
Using Scala version 2.12.18, OpenJDK 64-Bit Server VM,
1.8.0_392-internalBranch HEADCompiled by user ubuntu on
2023-09-09T01:53:20ZRevision ce5ddad990373636e94071e7cef2f31021add07bUrl
https://github.com/apache/sparkType --help for more information.➜ ~ export
SPARK_VERSION=3.5 # or 3.4, 3.3spark-shell --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:1.1.0-SNAPSHOT \--conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' \--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
\--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'::
loading settings :: url =
jar:file:/Users/linliu/libraries/spark-3.5.0-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xmlIvy
Default Cache set to: /Users/linliu/.ivy2/cacheThe jars for the packages
stored in: /Users/linliu/.ivy2/jarsorg.apache.hudi#hudi-spark3.5
-bundle_2.12 added as a dependency:: resolving dependencies ::
org.apache.spark#spark-submit-parent-98cac2cf-863b-4747-99fd-31654bd46720;1.0
confs: [default] found
org.apache.hudi#hudi-spark3.5-bundle_2.12;1.1.0-SNAPSHOT in local-m2-cache
found org.apache.hive#hive-storage-api;2.8.1 in local-m2-cache found
org.slf4j#slf4j-api;1.7.36 in local-m2-cache:: resolution report :: resolve
290ms :: artifacts dl 15ms :: modules in use:
org.apache.hive#hive-storage-api;2.8.1 from local-m2-cache in [default]
org.apache.hudi#hudi-spark3.5-bundle_2.12;1.1.0-SNAPSHOT from local-m2-cache in
[default] org.slf4j#slf4j-api;1.7.36 from local-m2-cache in [default]
--------------------------------------------------------------------- |
| modules || artifacts | | conf
| number| search|dwnlded|evicted|| number|dwnlded|
--------------------------------------------------------------------- |
default | 3
| 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------::
retrieving ::
org.apache.spark#spark-submit-parent-98cac2cf-863b-4747-99fd-31654bd46720
confs: [default] 0 artifacts copied, 3 already retrieved
(0kB/7ms)25/09/05 14:19:37 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicableSetting
default log level to "WARN".To adjust logging level use
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).25/09/05
14:19:46 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting
port 4041.Spark context Web UI available at
http://lins-mbp.attlocal.net:4041Spark context available as 'sc' (master =
local[*], app id = local-1757107186666).Spark session available as
'spark'.Welcome to ____ __ / __/__ ___ _____/ /__ _\
\/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version
3.5.0 /_/
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java
1.8.0_392-internal)Type in expressions to have them evaluated.Type :help for
more information.
scala> import scala.collection.JavaConversions._import
scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._import
org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._import
org.apache.hudi.DataSourceReadOptions._
scala> import org.apache.hudi.DataSourceWriteOptions._import
org.apache.hudi.DataSourceWriteOptions._
scala> import org.apache.hudi.common.table.HoodieTableConfig._import
org.apache.hudi.common.table.HoodieTableConfig._
scala> import org.apache.hudi.config.HoodieWriteConfig._import
org.apache.hudi.config.HoodieWriteConfig._
scala> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._import
org.apache.hudi.keygen.constant.KeyGeneratorOptions._
scala> import org.apache.hudi.common.model.HoodieRecordimport
org.apache.hudi.common.model.HoodieRecord
scala> import spark.implicits._import spark.implicits._
scala> val tableName = "trips_table"tableName: String = trips_table
scala> val basePath = "file:///tmp/trips_table"basePath: String =
file:///tmp/trips_table
scala> val updatesDf =
spark.read.format("hudi").load(basePath).filter($"rider" ===
"rider-D").withColumn("fare", col("fare") * 10)25/09/05 14:20:12 WARN
DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props
file25/09/05 14:20:12 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf25/09/05 14:20:13
WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.updatesDf: org.apache.spark.sql.DataFrame = [_hoodie_commit_time:
string, _hoodie_commit_seqno: string ... 9 more fields]
scala>
scala>
updatesDf.write.format("hudi").option("hoodie.datasource.write.operation",
"upsert").option("hoodie.datasource.write.partitionpath.field",
"city").option("hoodie.table.name",
tableName).option("hoodie.metadata.index.column.stats.enable",
"true").option("hoodie.write.table.version",
"6").option("hoodie.datasource.write.table.type",
"MERGE_ON_READ").mode(Append).save(basePath)# WARNING: Unable to attach
Serviceability Agent. Unable to attach even with module exceptions:
[org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense failed.,
org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]25/09/05 14:20:16 WARN ConfigUtils: The configuration key
'hoodie.compaction.record.merger.strategy' has been deprecated and may be
removed in the future. Please use the new key 'hoodie.record.merge.strategy.id'
instead.25/09/05 14:20:16 WARN ConfigUtils: The configuration key
'hoodie.compaction
.record.merger.strategy' has been deprecated and may be removed in the future.
Please use the new key 'hoodie.record.merge.strategy.id' instead.25/09/05
14:20:16 WARN HoodieWriterUtils$: Changing operation type to UPSERT PREPPED for
pk less table upserts25/09/05 14:20:16 WARN AutoRecordKeyGenerationUtils$:
Precombine field ts will be ignored with auto record key generation
enabled25/09/05 14:20:16 WARN HoodieWriteConfig: HoodieTableVersion.SIX is not
yet fully supported by the writer. Please expect some unexpected behavior,
until its fully implemented.25/09/05 14:20:22 WARN HoodieWriteConfig:
HoodieTableVersion.SIX is not yet fully supported by the writer. Please expect
some unexpected behavior, until its fully implemented.25/09/05 14:20:23 WARN
ConfigUtils: The configuration key 'hoodie.compaction.record.merger.strategy'
has been deprecated and may be removed in the future. Please use the new key
'hoodie.record.merge.strategy.id' instead.25/09/05 14:20:27 WARN
HoodieWriteConfig: Ho
odieTableVersion.SIX is not yet fully supported by the writer. Please expect
some unexpected behavior, until its fully implemented.25/09/05 14:20:28 WARN
MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
scala> %➜ ~ spark-3.2Welcome to ____ __ / __/__ ___
_____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version
3.2.1 /_/
Using Scala version 2.12.15, OpenJDK 64-Bit Server VM,
1.8.0_392-internalBranch HEADCompiled by user hgao on
2022-01-20T19:26:14ZRevision 4f25b3f71238a00508a356591553f2dfa89f8290Url
https://github.com/apache/sparkType --help for more information.➜ ~ export
SPARK_VERSION=3.2➜ ~ spark-shell --packages
org.apache.hudi:hudi-spark$SPARK_VERSION-bundle_2.12:0.14.1 --conf
'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'::
loading settings :: url =
jar:file:/Users/linliu/libraries/spark-3.2.1-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xmlIvy
Default Cache set to: /Users/linliu/.ivy2/cacheThe jars for the packages
stored in: /Users/linliu/.ivy2/jarsorg.apache.hudi#hudi-spark3.2-bundle_2.12
added as a dependency:: resolving dependencies ::
org.apache.spark#spark-submit-parent-79f360af-
ce55-47c6-a9d1-9bcbe74c0b03;1.0 confs: [default] found
org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1 in local-m2-cache:: resolution
report :: resolve 180ms :: artifacts dl 4ms :: modules in use:
org.apache.hudi#hudi-spark3.2-bundle_2.12;0.14.1 from local-m2-cache in
[default]
--------------------------------------------------------------------- |
| modules || artifacts | | conf
| number| search|dwnlded|evicted|| number|dwnlded|
--------------------------------------------------------------------- |
default | 1 | 0 | 0 | 0 || 1 | 0 |
---------------------------------------------------------------------::
retrieving ::
org.apache.spark#spark-submit-parent-79f360af-ce55-47c6-a9d1-9bcbe74c0b03
confs: [default] 0 artifacts copied, 1 already retrieved
(0kB/6ms)25/09/05 14:21:01 WARN NativeCodeLoader: Unable to load native-hadoop
library
for your platform... using builtin-java classes where applicableUsing Spark's
default log4j profile: org/apache/spark/log4j-defaults.propertiesSetting
default log level to "WARN".To adjust logging level use
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).25/09/05
14:21:09 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting
port 4041.Spark context Web UI available at
http://lins-mbp.attlocal.net:4041Spark context available as 'sc' (master =
local[*], app id = local-1757107269339).Spark session available as
'spark'.Welcome to ____ __ / __/__ ___ _____/ /__ _\
\/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.2.1 /_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java
1.8.0_392-internal)Type in expressions to have them evaluated.Type :help for
more information.
scala> spark.read.format("hudi").option("hoodie.metadata.enable",
"true").option("hoodie.enable.data.skipping",
"true").option("hoodie.metadata.index.column.stats.enable",
"true").load("/tmp/trips_table").filter("fare > 100").show(100,false)25/09/05
14:21:18 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set
it as the dir of hudi-defaults.conf25/09/05 14:21:18 WARN
DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file#
WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]25/09/05 14:21:20 WARN MetricsConfig: Cannot locate configuration:
tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties+-------------------+------------------
---+---------------------+----------------------+--------------------------------------+-------------+------------------------------------+-------+--------+------+-------------+|_hoodie_commit_time|_hoodie_commit_seqno
|_hoodie_record_key |_hoodie_partition_path|_hoodie_file_name
|ts |uuid |rider |driver |fare
|city
|+-------------------+---------------------+---------------------+----------------------+--------------------------------------+-------------+------------------------------------+-------+--------+------+-------------+|20250905142017073
|20250905142017073_0_1|20250905141844928_2_0|san_francisco
|fbe58388-7307-4159-90fa-f2e0b63f111d-0|1695046462179|9909a8b1-2d15-4d3d-8ec9-efc48c536a00|rider-D|driver-L|3390.0|san_francisco|+-------------------+---------------------+---------------------+----------------------+-----------------------------------
---+-------------+------------------------------------+-------+--------+------+-------------+
{code};;;
---
05/Sep/25 21:24;linliu;Based on the above experiment, Spark3.2 + Hudi 0.14
and Spark3.5 + Hudi 1.1.0-Snapshot works.;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]