[I] The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS [hudi]

via GitHub Sat, 29 Nov 2025 21:00:42 -0800


hudi-bot opened a new issue, #15214:
URL: https://github.com/apache/hudi/issues/15214


   When I execute the following code in container adhoc-2:
   {code:java}
   spark-submit \
     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
     --table-type COPY_ON_WRITE \
     --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
     --source-ordering-field ts  \
     --target-base-path /user/hive/warehouse/stock_ticks_cow \
     --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
     --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider {code}
   An error is as follows:
   {code:java}
   root@adhoc-2:/opt# spark-submit \
   >   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
   >   --table-type COPY_ON_WRITE \
   >   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   >   --source-ordering-field ts  \
   >   --target-base-path /user/hive/warehouse/stock_ticks_cow \
   >   --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
   >   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   Exception in thread "main" org.apache.spark.SparkException: Cannot load main 
class from JAR file:/opt/%C2%A0
       at 
org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
       at 
org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
       at 
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
       at 
org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:907)
       at 
org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:907)
       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
       at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
   When I check the environment variable $HUDI_UTILITIES_BUNDLE, I got this:
   {code:java}
   root@adhoc-2:/opt# echo $HUDI_UTILITIES_BUNDLE
   /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar 
{code}
   But, I can't find the jar file:
   {code:java}
   root@adhoc-2:/opt# ls -ltr 
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
   ls: cannot access 
'/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar': No 
suchfile or directory {code}
   When I try find this:
   {code:java}
   root@adhoc-2:/opt# find /var/hoodie/ws -name "hudi-utilities-bundle*.0.jar" 
| xargs ls -ltr
   -rw-r--r-- 1 root root 60631874 Jun  8 07:41 
/var/hoodie/ws/hudi-examples/hudi-examples-spark/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
   -rw-r--r-- 1 root root 60631874 Jun  8 07:41 
/var/hoodie/ws/hudi-cli/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
   -rw-r--r-- 1 root root 60631874 Jun  8 07:41 
/var/hoodie/ws/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0.jar
 {code}
   So I tried to modify the environment variable  $HUDI_UTILITIES_BUNDLE, and 
resubmit the command, it worked:
   {code:java}
   root@adhoc-2:/opt# spark-submit \
   >   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
$HUDI_UTILITIES_BUNDLE \
   >   --table-type COPY_ON_WRITE \
   >   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   >   --source-ordering-field ts  \
   >   --target-base-path /user/hive/warehouse/stock_ticks_cow \
   >   --target-table stock_ticks_cow --props 
/var/demo/config/kafka-source.properties \
   >   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   22/06/09 01:43:34 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   22/06/09 01:43:35 WARN SchedulerConfGenerator: Job Scheduling Configs will 
not be in effect as spark.scheduler.mode is not set to FAIR at instantiation 
time. Continuing without scheduling configs
   22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dirof hudi-defaults.conf
   22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Properties file 
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   22/06/09 01:43:36 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
   22/06/09 01:43:37 WARN KafkaUtils: overriding enable.auto.commit to false 
for executor
   22/06/09 01:43:37 WARN KafkaUtils: overriding auto.offset.reset to none for 
executor
   22/06/09 01:43:37 ERROR KafkaUtils: group.id is null, you should probably 
set it
   22/06/09 01:43:37 WARN KafkaUtils: overriding executor group.id to 
spark-executor-null
   22/06/09 01:43:37 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 
see KAFKA-3135
   22/06/09 01:43:38 WARN HoodieBackedTableMetadata: Metadata table was not 
found at path /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata
   00:05  WARN: Timeline-server-based markers are not supported for HDFS: base 
path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers.
   00:06  WARN: Timeline-server-based markers are not supported for HDFS: base 
path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers.
   00:08  WARN: Timeline-server-based markers are not supported for HDFS: base 
path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers. 
{code}
   I could view the data had been written in the HDFS:
   {code:java}
   root@adhoc-2:/opt# hdfs dfs -ls /user/hive/warehouse/stock_ticks_cow/*/*/*/*
   Found 1 items
   drwxr-xr-x   - root supergroup          0 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/.aux/.bootstrap
   -rw-r--r--   1 root supergroup       8056 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit
   -rw-r--r--   1 root supergroup       3035 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight
   -rw-r--r--   1 root supergroup          0 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.requested
   -rw-r--r--   1 root supergroup       8139 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit
   -rw-r--r--   1 root supergroup       3035 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.inflight
   -rw-r--r--   1 root supergroup          0 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.requested
   -rw-r--r--   1 root supergroup        599 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/hoodie.properties
   -rw-r--r--   1 root supergroup        124 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-0-0
   -rw-r--r--   1 root supergroup      21951 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-10-10
   -rw-r--r--   1 root supergroup         93 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.hoodie_partition_metadata
   -rw-r--r--   1 root supergroup         96 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/.hoodie_partition_metadata
   -rw-r--r--   1 root supergroup     436884 2022-06-09 01:43 
/user/hive/warehouse/stock_ticks_cow/2018/08/31/7610b058-8df2-484a-ba70-881feef7195e-0_0-36-35_20220609014338711.parquet
 {code}
   So my question is whether I need to modify $HUDI_UTILITIES_BUNDLE ?
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4211
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS [hudi]

Reply via email to