Hi Michael, Thanks for the catch. I assume you meant *spark-streaming-kafka-0-10_2.11-2.1.0.jar*
I add this in all spark machines under SPARK_HOME/jars. Still same error seems to persist. Is that the right jar or is there anything else I need to add? Thanks! On Tue, May 16, 2017 at 1:40 PM, Michael Armbrust <mich...@databricks.com> wrote: > Looks like you are missing the kafka dependency. > > On Tue, May 16, 2017 at 1:04 PM, kant kodali <kanth...@gmail.com> wrote: > >> Looks like I am getting the following runtime exception. I am using Spark >> 2.1.0 and the following jars >> >> *spark-sql_2.11-2.1.0.jar* >> >> *spark-sql-kafka-0-10_2.11-2.1.0.jar* >> >> *spark-streaming_2.11-2.1.0.jar* >> >> >> Exception in thread "stream execution thread for [id = >> fcfe1fa6-dab3-4769-9e15-e074af622cc1, runId = >> 7c54940a-e453-41de-b256-049b539b59b1]" >> >> java.lang.NoClassDefFoundError: >> org/apache/kafka/common/serialization/ByteArrayDeserializer >> at >> org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:74) >> at >> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:245) >> at >> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:127) >> at >> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:123) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) >> at >> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) >> at >> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) >> >> >> On Tue, May 16, 2017 at 10:30 AM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> The default "startingOffsets" is "latest". If you don't push any data >>> after starting the query, it won't fetch anything. You can set it to >>> "earliest" like ".option("startingOffsets", "earliest")" to start the >>> stream from the beginning. >>> >>> On Tue, May 16, 2017 at 12:36 AM, kant kodali <kanth...@gmail.com> >>> wrote: >>> >>>> Hi All, >>>> >>>> I have the following code. >>>> >>>> val ds = sparkSession.readStream() >>>> .format("kafka") >>>> .option("kafka.bootstrap.servers",bootstrapServers)) >>>> .option("subscribe", topicName) >>>> .option("checkpointLocation", hdfsCheckPointDir) >>>> .load(); >>>> >>>> val ds1 = ds.select($"value") >>>> val query = ds1.writeStream.outputMode("append").format("console").start() >>>> query.awaitTermination() >>>> >>>> There are no errors when I execute this code however I don't see any >>>> data being printed out to console? When I run my standalone test Kafka >>>> consumer jar I can see that it is receiving messages. so I am not sure what >>>> is going on with above code? any ideas? >>>> >>>> Thanks! >>>> >>> >>> >> >