Hi, I am finding difficulty in getting the proper Kafka lib's for spark. The version of HDP is 3.1 and i tried the below lib's but it produces the below issues.
*POM entry :* <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>2.0.0.3.1.0.0-78</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.11</artifactId> <version>2.0.0.3.1.0.0-78</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.compat.version}</artifactId> <version>${spark.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.3.2.3.1.0.0-78</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>2.3.2.3.1.0.0-78</version> </dependency> *Issues while spark-submit :* Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18) at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) Could someone help me if i am doing something wrong ? *Spark Submit:* export KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties" export KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties" export SPARK_KAFKA_VERSION=NONE spark-submit --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties" --files "kafka.consumer.properties" --class com.example.ReadDataFromKafka HelloKafka-1.0-SNAPSHOT.jar *Consumer Code : * https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/ Regards, William R