Hello!

I would like to use the spark structured streaming integrated with Kafka
the way is described here:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html


but I got the following issue:

Caused by: org.apache.spark.sql.AnalysisException: Failed to find data
source: kafka. Please deploy the application as per the deployment section
of "Structured Streaming + Kafka Integration Guide".;

eventhough  I've added in the generated fat jar the kafka-sql dependencies:
 <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.3</version>
<scope>compile</scope>
</dependency>

When I submit with the command

spark-submit  --master spark://spark-master:7077  --class myClass
--deploy-mode client *--packages
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3
my-fat-jar-with-dependencies.jar*

the problem is gone.

Since the packages option requires to download the libaries from an
environment that has access to internet and I don't have it, can you please
advice what can I do to add kafka dependecies either in the fat jar or
other solution.

Thank you.

Regards,

Florin

Reply via email to