I have found many library incompatibility issues including JVM headless issues where I had to uninstall headless jvm and install jdk and work through them, anyway This page shows the same error as yours, you may get away with making the changes to your pom.xml as suggested. https://stackoverflow.com/questions/41303037/why-does-spark-application-fail-with-classnotfoundexception-failed-to-find-dat
Good Luck ! Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org <http://www.backbutton.co.uk> On Wed, 18 Mar 2020 at 16:36, William R <rspwill...@gmail.com> wrote: > Hi, > > I am finding difficulty in getting the proper Kafka lib's for spark. The > version of HDP is 3.1 and i tried the below lib's but it produces the below > issues. > > *POM entry :* > > <dependency> > <groupId>org.apache.kafka</groupId> > <artifactId>kafka-clients</artifactId> > <version>2.0.0.3.1.0.0-78</version> > </dependency> > <dependency> > <groupId>org.apache.kafka</groupId> > <artifactId>kafka_2.11</artifactId> > <version>2.0.0.3.1.0.0-78</version> > </dependency> > > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-sql_${scala.compat.version}</artifactId> > <version>${spark.version}</version> > <scope>provided</scope> > </dependency> > > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-core_2.11</artifactId> > <version>2.3.2.3.1.0.0-78</version> > <scope>provided</scope> > </dependency> > <dependency> > <groupId>org.apache.spark</groupId> > <artifactId>spark-streaming_2.11</artifactId> > <version>2.3.2.3.1.0.0-78</version> > </dependency> > > *Issues while spark-submit :* > > Exception in thread "main" java.lang.ClassNotFoundException: Failed to > find data source: kafka. Please find packages at > http://spark.apache.org/third-party-projects.html > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) > at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18) > at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > > > Could someone help me if i am doing something wrong ? > > *Spark Submit:* > > export > KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties" > export > KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties" > export SPARK_KAFKA_VERSION=NONE > > spark-submit --conf > "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties" > --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka > HelloKafka-1.0-SNAPSHOT.jar > > *Consumer Code : * > https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/ > > > Regards, > William R > > > >