Figured it out. All I am doing wrong is testing it out in pseudo node vm
with 1 core. The tasks were hanging out for cpu.
In production cluster this works just fine.
On Thu, Aug 11, 2016 at 12:45 AM, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:
> Checked executor logs and UI .
Checked executor logs and UI . There is no error message or something like
that. when there is any action , it is waiting .
There are data in partitions. I could use simple-consumer-shell and print
all data in console. Am I doing anything wrong in foreachRDD?.
This just works fine with single
zookeeper.connect is irrelevant.
Did you look at your executor logs?
Did you look at the UI for the (probably failed) stages?
Are you actually producing data into all of the kafka partitions?
If you use kafka-simple-consumer-shell.sh to read that partition, do
you get any data?
On Wed, Aug 10,
Hi Cody,
Just added zookeeper.connect to kafkaparams . It couldn't come out of batch
window. Other batches are queued. I could see foreach(println) of dataFrame
printing one of partition's data and not the other.
Couldn't see any errors from log.
val brokers = "localhost:9092,localhost:9093"
Those logs you're posting are from right after your failure, they don't
include what actually went wrong when attempting to read json. Look at your
logs more carefully.
On Aug 10, 2016 2:07 AM, "Diwakar Dhanuskodi"
wrote:
> Hi Siva,
>
> With below code, it is stuck
I am testing with one partition now. I am using Kafka 0.9 and Spark 1.6.1
(Scala 2.11). Just start with one topic first and then add more. I am not
partitioning the topic.
HTH,
Regards,
Sivakumaran
> On 10-Aug-2016, at 5:56 AM, Diwakar Dhanuskodi
> wrote:
>
Hi Siva,
With below code, it is stuck up at
* sqlContext.read.json(rdd.map(_._2)).toDF()*
There are two partitions in topic.
I am running spark 1.6.2
val topics = "topic.name"
val brokers = "localhost:9092"
val topicsSet = topics.split(",").toSet
val sparkConf = new
Hi Siva,
Does topic has partitions? which version of Spark you are using?
On Wed, Aug 10, 2016 at 2:38 AM, Sivakumaran S wrote:
> Hi,
>
> Here is a working example I did.
>
> HTH
>
> Regards,
>
> Sivakumaran S
>
> val topics = "test"
> val brokers = "localhost:9092"
> val
It stops working at sqlContext.read.json(rdd.map(_._2)) . Topics without
partitions is working fine. Do I need to set any other configs
val kafkaParams =
Map[String,String]("bootstrap.servers"->"localhost:9093,localhost:9092", "
group.id" -> "xyz","auto.offset.reset"->"smallest")
Spark version is
No, you don't need a conditional. read.json on an empty rdd will
return an empty dataframe. Foreach on an empty dataframe or an empty
rdd won't do anything (a task will still get run, but it won't do
anything).
Leave the conditional out. Add one thing at a time to the working
rdd.foreach
Hi Cody,
Without conditional . It is going with fine. But any processing inside
conditional get on to waiting (or) something.
Facing this issue with partitioned topics. I would need conditional to skip
processing when batch is empty.
kafkaStream.foreachRDD(
rdd => {
val dataFrame =
Hi,
Here is a working example I did.
HTH
Regards,
Sivakumaran S
val topics = "test"
val brokers = "localhost:9092"
val topicsSet = topics.split(",").toSet
val sparkConf = new
SparkConf().setAppName("KafkaWeatherCalc").setMaster("local")
//spark://localhost:7077
val sc = new
Take out the conditional and the sqlcontext and just do
rdd => {
rdd.foreach(println)
as a base line to see if you're reading the data you expect
On Tue, Aug 9, 2016 at 3:47 PM, Diwakar Dhanuskodi
wrote:
> Hi,
>
> I am reading json messages from kafka . Topics
Hi,
I am reading json messages from kafka . Topics has 2 partitions. When
running streaming job using spark-submit, I could see that * val dataFrame
= sqlContext.read.json(rdd.map(_._2)) *executes indefinitely. Am I doing
something wrong here. Below is code .This environment is cloudera sandbox
14 matches
Mail list logo