I have found many library incompatibility issues including JVM headless
issues where I had to uninstall  headless jvm and install jdk
and work through them, anyway
This page shows the same error as yours,
you  may get away  with making the changes to your pom.xml as suggested.
https://stackoverflow.com/questions/41303037/why-does-spark-application-fail-with-classnotfoundexception-failed-to-find-dat

Good Luck !

Backbutton.co.uk
¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}
makeuse.org
<http://www.backbutton.co.uk>


On Wed, 18 Mar 2020 at 16:36, William R <rspwill...@gmail.com> wrote:

> Hi,
>
> I am finding difficulty in getting the proper Kafka lib's for spark. The
> version of HDP is 3.1 and i tried the below lib's but it produces the below
> issues.
>
> *POM entry :*
>
> <dependency>
>     <groupId>org.apache.kafka</groupId>
>     <artifactId>kafka-clients</artifactId>
>     <version>2.0.0.3.1.0.0-78</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.kafka</groupId>
>     <artifactId>kafka_2.11</artifactId>
>     <version>2.0.0.3.1.0.0-78</version>
> </dependency>
>
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-sql_${scala.compat.version}</artifactId>
>     <version>${spark.version}</version>
>     <scope>provided</scope>
> </dependency>
>
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-core_2.11</artifactId>
>     <version>2.3.2.3.1.0.0-78</version>
>     <scope>provided</scope>
> </dependency>
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-streaming_2.11</artifactId>
>     <version>2.3.2.3.1.0.0-78</version>
> </dependency>
>
> *Issues while spark-submit :*
>
> Exception in thread "main" java.lang.ClassNotFoundException: Failed to
> find data source: kafka. Please find packages at
> http://spark.apache.org/third-party-projects.html
>         at
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
>         at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
>         at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
>         at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
>         at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>         at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
>         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
>         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>
>
> Could someone help me if i am doing something wrong ?
>
> *Spark Submit:*
>
> export
> KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
> export
> KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
> export SPARK_KAFKA_VERSION=NONE
>
> spark-submit --conf
> "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties"
> --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka
> HelloKafka-1.0-SNAPSHOT.jar
>
> *Consumer Code : *
> https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/
>
>
> Regards,
> William R
>
>
>
>

Reply via email to