hudi-bot opened a new issue, #15214:
URL: https://github.com/apache/hudi/issues/15214
When I execute the following code in container adhoc-2:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
$HUDI_UTILITIES_BUNDLE \
--table-type COPY_ON_WRITE \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field ts \
--target-base-path /user/hive/warehouse/stock_ticks_cow \
--target-table stock_ticks_cow --props
/var/demo/config/kafka-source.properties \
--schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider {code}
An error is as follows:
{code:java}
root@adhoc-2:/opt# spark-submit \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
$HUDI_UTILITIES_BUNDLE \
> --table-type COPY_ON_WRITE \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --source-ordering-field ts \
> --target-base-path /user/hive/warehouse/stock_ticks_cow \
> --target-table stock_ticks_cow --props
/var/demo/config/kafka-source.properties \
> --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
Exception in thread "main" org.apache.spark.SparkException: Cannot load main
class from JAR file:/opt/%C2%A0
at
org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
at
org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
at
org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
at
org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:907)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:907)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
When I check the environment variable $HUDI_UTILITIES_BUNDLE, I got this:
{code:java}
root@adhoc-2:/opt# echo $HUDI_UTILITIES_BUNDLE
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
{code}
But, I can't find the jar file:
{code:java}
root@adhoc-2:/opt# ls -ltr
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
ls: cannot access
'/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar': No
suchfile or directory {code}
When I try find this:
{code:java}
root@adhoc-2:/opt# find /var/hoodie/ws -name "hudi-utilities-bundle*.0.jar"
| xargs ls -ltr
-rw-r--r-- 1 root root 60631874 Jun 8 07:41
/var/hoodie/ws/hudi-examples/hudi-examples-spark/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun 8 07:41
/var/hoodie/ws/hudi-cli/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun 8 07:41
/var/hoodie/ws/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0.jar
{code}
So I tried to modify the environment variable $HUDI_UTILITIES_BUNDLE, and
resubmit the command, it worked:
{code:java}
root@adhoc-2:/opt# spark-submit \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
$HUDI_UTILITIES_BUNDLE \
> --table-type COPY_ON_WRITE \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --source-ordering-field ts \
> --target-base-path /user/hive/warehouse/stock_ticks_cow \
> --target-table stock_ticks_cow --props
/var/demo/config/kafka-source.properties \
> --schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider
22/06/09 01:43:34 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
22/06/09 01:43:35 WARN SchedulerConfGenerator: Job Scheduling Configs will
not be in effect as spark.scheduler.mode is not set to FAIR at instantiation
time. Continuing without scheduling configs
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dirof hudi-defaults.conf
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
22/06/09 01:43:36 WARN SparkContext: Using an existing SparkContext; some
configuration may not take effect.
22/06/09 01:43:37 WARN KafkaUtils: overriding enable.auto.commit to false
for executor
22/06/09 01:43:37 WARN KafkaUtils: overriding auto.offset.reset to none for
executor
22/06/09 01:43:37 ERROR KafkaUtils: group.id is null, you should probably
set it
22/06/09 01:43:37 WARN KafkaUtils: overriding executor group.id to
spark-executor-null
22/06/09 01:43:37 WARN KafkaUtils: overriding receive.buffer.bytes to 65536
see KAFKA-3135
22/06/09 01:43:38 WARN HoodieBackedTableMetadata: Metadata table was not
found at path /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata
00:05 WARN: Timeline-server-based markers are not supported for HDFS: base
path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers.
00:06 WARN: Timeline-server-based markers are not supported for HDFS: base
path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers.
00:08 WARN: Timeline-server-based markers are not supported for HDFS: base
path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers.
{code}
I could view the data had been written in the HDFS:
{code:java}
root@adhoc-2:/opt# hdfs dfs -ls /user/hive/warehouse/stock_ticks_cow/*/*/*/*
Found 1 items
drwxr-xr-x - root supergroup 0 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/.aux/.bootstrap
-rw-r--r-- 1 root supergroup 8056 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit
-rw-r--r-- 1 root supergroup 3035 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight
-rw-r--r-- 1 root supergroup 0 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.requested
-rw-r--r-- 1 root supergroup 8139 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit
-rw-r--r-- 1 root supergroup 3035 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.inflight
-rw-r--r-- 1 root supergroup 0 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.requested
-rw-r--r-- 1 root supergroup 599 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/hoodie.properties
-rw-r--r-- 1 root supergroup 124 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-0-0
-rw-r--r-- 1 root supergroup 21951 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-10-10
-rw-r--r-- 1 root supergroup 93 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.hoodie_partition_metadata
-rw-r--r-- 1 root supergroup 96 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/2018/08/31/.hoodie_partition_metadata
-rw-r--r-- 1 root supergroup 436884 2022-06-09 01:43
/user/hive/warehouse/stock_ticks_cow/2018/08/31/7610b058-8df2-484a-ba70-881feef7195e-0_0-36-35_20220609014338711.parquet
{code}
So my question is whether I need to modify $HUDI_UTILITIES_BUNDLE ?
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-4211
- Type: Bug
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]