Hi,

I am finding difficulty in getting the proper Kafka lib's for spark. The
version of HDP is 3.1 and i tried the below lib's but it produces the below
issues.

*POM entry :*

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.0.0.3.1.0.0-78</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>2.0.0.3.1.0.0-78</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.compat.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.3.2.3.1.0.0-78</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.3.2.3.1.0.0-78</version>
</dependency>

*Issues while spark-submit :*

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: kafka. Please find packages at
http://spark.apache.org/third-party-projects.html
        at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
        at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
        at com.example.ReadDataFromKafka$.main(ReadDataFromKafka.scala:18)
        at com.example.ReadDataFromKafka.main(ReadDataFromKafka.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)


Could someone help me if i am doing something wrong ?

*Spark Submit:*

export
KAFKA_KERBEROS_PARAMS="-Djava.security.auth.login.config=kafka.consumer.properties"
export
KAFKA_OPTS="-Djava.security.auth.login.config=kafka.consumer.properties"
export SPARK_KAFKA_VERSION=NONE

spark-submit --conf
"spark.driver.extraJavaOptions=-Djava.security.auth.login.conf=kafka.consumer.properties"
--files "kafka.consumer.properties" --class com.example.ReadDataFromKafka
HelloKafka-1.0-SNAPSHOT.jar

*Consumer Code : *
https://sparkbyexamples.com/spark/spark-batch-processing-produce-consume-kafka-topic/


Regards,
William R

Reply via email to