Re: Spark streaming not processing messages from partitioned topics

2016-08-11 Thread Diwakar Dhanuskodi
Figured it out. All I am doing wrong is testing it out in pseudo node vm with 1 core. The tasks were hanging out for cpu. In production cluster this works just fine. On Thu, Aug 11, 2016 at 12:45 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Checked executor logs and UI .

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
Checked executor logs and UI . There is no error message or something like that. when there is any action , it is waiting . There are data in partitions. I could use simple-consumer-shell and print all data in console. Am I doing anything wrong in foreachRDD?. This just works fine with single

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Cody Koeninger
zookeeper.connect is irrelevant. Did you look at your executor logs? Did you look at the UI for the (probably failed) stages? Are you actually producing data into all of the kafka partitions? If you use kafka-simple-consumer-shell.sh to read that partition, do you get any data? On Wed, Aug 10,

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
Hi Cody, Just added zookeeper.connect to kafkaparams . It couldn't come out of batch window. Other batches are queued. I could see foreach(println) of dataFrame printing one of partition's data and not the other. Couldn't see any errors from log. val brokers = "localhost:9092,localhost:9093"

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Cody Koeninger
Those logs you're posting are from right after your failure, they don't include what actually went wrong when attempting to read json. Look at your logs more carefully. On Aug 10, 2016 2:07 AM, "Diwakar Dhanuskodi" wrote: > Hi Siva, > > With below code, it is stuck

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Sivakumaran S
I am testing with one partition now. I am using Kafka 0.9 and Spark 1.6.1 (Scala 2.11). Just start with one topic first and then add more. I am not partitioning the topic. HTH, Regards, Sivakumaran > On 10-Aug-2016, at 5:56 AM, Diwakar Dhanuskodi > wrote: >

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
Hi Siva, With below code, it is stuck up at * sqlContext.read.json(rdd.map(_._2)).toDF()* There are two partitions in topic. I am running spark 1.6.2 val topics = "topic.name" val brokers = "localhost:9092" val topicsSet = topics.split(",").toSet val sparkConf = new

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
Hi Siva, Does topic has partitions? which version of Spark you are using? On Wed, Aug 10, 2016 at 2:38 AM, Sivakumaran S wrote: > Hi, > > Here is a working example I did. > > HTH > > Regards, > > Sivakumaran S > > val topics = "test" > val brokers = "localhost:9092" > val

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
It stops working at sqlContext.read.json(rdd.map(_._2)) . Topics without partitions is working fine. Do I need to set any other configs val kafkaParams = Map[String,String]("bootstrap.servers"->"localhost:9093,localhost:9092", " group.id" -> "xyz","auto.offset.reset"->"smallest") Spark version is

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Cody Koeninger
No, you don't need a conditional. read.json on an empty rdd will return an empty dataframe. Foreach on an empty dataframe or an empty rdd won't do anything (a task will still get run, but it won't do anything). Leave the conditional out. Add one thing at a time to the working rdd.foreach

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
Hi Cody, Without conditional . It is going with fine. But any processing inside conditional get on to waiting (or) something. Facing this issue with partitioned topics. I would need conditional to skip processing when batch is empty. kafkaStream.foreachRDD( rdd => { val dataFrame =

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Sivakumaran S
Hi, Here is a working example I did. HTH Regards, Sivakumaran S val topics = "test" val brokers = "localhost:9092" val topicsSet = topics.split(",").toSet val sparkConf = new SparkConf().setAppName("KafkaWeatherCalc").setMaster("local") //spark://localhost:7077 val sc = new

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Cody Koeninger
Take out the conditional and the sqlcontext and just do rdd => { rdd.foreach(println) as a base line to see if you're reading the data you expect On Tue, Aug 9, 2016 at 3:47 PM, Diwakar Dhanuskodi wrote: > Hi, > > I am reading json messages from kafka . Topics

Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
Hi, I am reading json messages from kafka . Topics has 2 partitions. When running streaming job using spark-submit, I could see that * val dataFrame = sqlContext.read.json(rdd.map(_._2)) *executes indefinitely. Am I doing something wrong here. Below is code .This environment is cloudera sandbox