Hello everyone,

Currently we are testing structured streaming kafka drivers. We submit on
YARN(2.7.3) with --packages
'org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0', without problems.
However when we try to launch on spark standalone with deploy mode=cluster,
we get the "ClassNotFoundException: Failed to find data source: kafka" error
even though the launch command has added the kafka jars to -Dspark.jars
 (see below) and subsequent log further states these jars have been
successfully added.

All 10 jars exist in /home/spark/.ivy2 on all nodes. I manually checked to
see that KafkaSourceProvider class does exist in the
org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar.
I further confirmed there are no issues with the jars by launching the
driver in YARN without the --packages option and manually adding all 10
jars with  --jars option.
The nodes run Scala 2.11.8.

Any insight appreciated.

Thanking you in advance!
Heji

1) The automatically added jars by spark-submit:

-
Dspark.jars=file:/home/spark/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar,file:/home/spark/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar,file:/home/spark/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar,file:/home/spark/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:/home/spark/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar,file:/home/spark/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar,file:/home/spark/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar,file:/home/spark/.ivy2/jars/org.scalatest_scalatest_2.11-2.2.6.jar,file:/home/spark/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar,file:/home/spark/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar


Spark info messages which appears to have loaded these jars:

17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar
at 
spark://10.102.22.23:50513/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar
with timestamp 1485467844922
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
at spark://10.102.22.23:50513/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar
at spark://10.102.22.23:50513/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar
with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar
at spark://10.102.22.23:50513/jars/org.spark-project.spark_unused-1.0.0.jar
with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at
spark://10.102.22.23:50513/jars/net.jpountz.lz4_lz4-1.3.0.jar with
timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
at spark://10.102.22.23:50513/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at
spark://10.102.22.23:50513/jars/org.slf4j_slf4j-api-1.7.16.jar with
timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.scalatest_scalatest_2.11-2.2.6.jar at
spark://10.102.22.23:50513/jars/org.scalatest_scalatest_2.11-2.2.6.jar
with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar at
spark://10.102.22.23:50513/jars/org.scala-lang_scala-reflect-2.11.8.jar
with timestamp 1485467844924
17/01/26 21:57:24 INFO SparkContext: Added JAR
file:/home/spark/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar
at 
spark://10.102.22.23:50513/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar
with timestamp 1485467844924


The error message:

Caused by: java.lang.ClassNotFoundException: Failed to find data
source: kafka. Please find packages at
http://spark.apache.org/third-party-projects.html
        at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569)
        at 
org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
        at 
org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
        at 
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:197)
        at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
        at 
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
        at 
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
        at 
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
        at 
com.dematic.labs.analytics.diagnostics.spark.drivers.StructuredStreamingSignalCount$.main(StructuredStreamingSignalCount.scala:76)
        at 
com.dematic.labs.analytics.diagnostics.spark.drivers.StructuredStreamingSignalCount.main(StructuredStreamingSignalCount.scala)
        ... 6 more
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource

Reply via email to