Re: Spark streaming cannot receive any message from Kafka

2014-11-18 Thread Bill Jay
Hi Jerry,

I looked at KafkaUtils.createStream api and found actually the
spark.default.parallelism is specified in SparkConf instead. I do not
remember the exact stacks of the exception. But the exception was incurred
when createStream was called if we do not specify the
spark.default.parallelism. The error message basically shows parsing an
empty string into Int if spark.default.parallelism is not specified.

Bill

On Mon, Nov 17, 2014 at 4:45 PM, Shao, Saisai saisai.s...@intel.com wrote:

  Hi Bill,



 Would you mind describing what you found a little more specifically, I’m
 not sure there’s the a parameter in KafkaUtils.createStream you can specify
 the spark parallelism, also what is the exception stacks.



 Thanks

 Jerry



 *From:* Bill Jay [mailto:bill.jaypeter...@gmail.com]
 *Sent:* Tuesday, November 18, 2014 2:47 AM
 *To:* Helena Edelson
 *Cc:* Jay Vyas; u...@spark.incubator.apache.org; Tobias Pfeiffer; Shao,
 Saisai

 *Subject:* Re: Spark streaming cannot receive any message from Kafka



 Hi all,



 I find the reason of this issue. It seems in the new version, if I do not
 specify spark.default.parallelism in KafkaUtils.createstream, there will be
 an exception since the kakfa stream creation stage. In the previous
 versions, it seems Spark will use the default value.



 Thanks!



 Bill



 On Thu, Nov 13, 2014 at 5:00 AM, Helena Edelson 
 helena.edel...@datastax.com wrote:

 I encounter no issues with streaming from kafka to spark in 1.1.0. Do you
 perhaps have a version conflict?

 Helena

 On Nov 13, 2014 12:55 AM, Jay Vyas jayunit100.apa...@gmail.com wrote:

  Yup , very important that  n1 for spark streaming jobs, If local use
 local[2]



 The thing to remember is that your spark receiver will take a thread to
 itself and produce data , so u need another thread to consume it .



 In a cluster manager like yarn or mesos, the word thread Is not used
 anymore, I guess has different meaning- you need 2 or more free compute
 slots, and that should be guaranteed by looking to see how many free node
 managers are running etc.


 On Nov 12, 2014, at 7:53 PM, Shao, Saisai saisai.s...@intel.com wrote:

  Did you configure Spark master as local, it should be local[n], n  1
 for local mode. Beside there’s a Kafka wordcount example in Spark Streaming
 example, you can try that. I’ve tested with latest master, it’s OK.



 Thanks

 Jerry



 *From:* Tobias Pfeiffer [mailto:t...@preferred.jp t...@preferred.jp]
 *Sent:* Thursday, November 13, 2014 8:45 AM
 *To:* Bill Jay
 *Cc:* u...@spark.incubator.apache.org
 *Subject:* Re: Spark streaming cannot receive any message from Kafka



 Bill,



   However, when I am currently using Spark 1.1.0. the Spark streaming job
 cannot receive any messages from Kafka. I have not made any change to the
 code.



 Do you see any suspicious messages in the log output?



 Tobias







Re: Spark streaming cannot receive any message from Kafka

2014-11-17 Thread Bill Jay
Hi all,

I find the reason of this issue. It seems in the new version, if I do not
specify spark.default.parallelism in KafkaUtils.createstream, there will be
an exception since the kakfa stream creation stage. In the previous
versions, it seems Spark will use the default value.

Thanks!

Bill

On Thu, Nov 13, 2014 at 5:00 AM, Helena Edelson helena.edel...@datastax.com
 wrote:

 I encounter no issues with streaming from kafka to spark in 1.1.0. Do you
 perhaps have a version conflict?

 Helena
 On Nov 13, 2014 12:55 AM, Jay Vyas jayunit100.apa...@gmail.com wrote:

 Yup , very important that  n1 for spark streaming jobs, If local use
 local[2]

 The thing to remember is that your spark receiver will take a thread to
 itself and produce data , so u need another thread to consume it .

 In a cluster manager like yarn or mesos, the word thread Is not used
 anymore, I guess has different meaning- you need 2 or more free compute
 slots, and that should be guaranteed by looking to see how many free node
 managers are running etc.

 On Nov 12, 2014, at 7:53 PM, Shao, Saisai saisai.s...@intel.com
 wrote:

  Did you configure Spark master as local, it should be local[n], n  1
 for local mode. Beside there’s a Kafka wordcount example in Spark Streaming
 example, you can try that. I’ve tested with latest master, it’s OK.



 Thanks

 Jerry



 *From:* Tobias Pfeiffer [mailto:t...@preferred.jp t...@preferred.jp]
 *Sent:* Thursday, November 13, 2014 8:45 AM
 *To:* Bill Jay
 *Cc:* u...@spark.incubator.apache.org
 *Subject:* Re: Spark streaming cannot receive any message from Kafka



 Bill,



   However, when I am currently using Spark 1.1.0. the Spark streaming
 job cannot receive any messages from Kafka. I have not made any change to
 the code.



 Do you see any suspicious messages in the log output?



 Tobias






RE: Spark streaming cannot receive any message from Kafka

2014-11-17 Thread Shao, Saisai
Hi Bill,

Would you mind describing what you found a little more specifically, I’m not 
sure there’s the a parameter in KafkaUtils.createStream you can specify the 
spark parallelism, also what is the exception stacks.

Thanks
Jerry

From: Bill Jay [mailto:bill.jaypeter...@gmail.com]
Sent: Tuesday, November 18, 2014 2:47 AM
To: Helena Edelson
Cc: Jay Vyas; u...@spark.incubator.apache.org; Tobias Pfeiffer; Shao, Saisai
Subject: Re: Spark streaming cannot receive any message from Kafka

Hi all,

I find the reason of this issue. It seems in the new version, if I do not 
specify spark.default.parallelism in KafkaUtils.createstream, there will be an 
exception since the kakfa stream creation stage. In the previous versions, it 
seems Spark will use the default value.

Thanks!

Bill

On Thu, Nov 13, 2014 at 5:00 AM, Helena Edelson 
helena.edel...@datastax.commailto:helena.edel...@datastax.com wrote:

I encounter no issues with streaming from kafka to spark in 1.1.0. Do you 
perhaps have a version conflict?

Helena
On Nov 13, 2014 12:55 AM, Jay Vyas 
jayunit100.apa...@gmail.commailto:jayunit100.apa...@gmail.com wrote:
Yup , very important that  n1 for spark streaming jobs, If local use 
local[2]

The thing to remember is that your spark receiver will take a thread to itself 
and produce data , so u need another thread to consume it .

In a cluster manager like yarn or mesos, the word thread Is not used anymore, I 
guess has different meaning- you need 2 or more free compute slots, and that 
should be guaranteed by looking to see how many free node managers are running 
etc.

On Nov 12, 2014, at 7:53 PM, Shao, Saisai 
saisai.s...@intel.commailto:saisai.s...@intel.com wrote:
Did you configure Spark master as local, it should be local[n], n  1 for local 
mode. Beside there’s a Kafka wordcount example in Spark Streaming example, you 
can try that. I’ve tested with latest master, it’s OK.

Thanks
Jerry

From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Thursday, November 13, 2014 8:45 AM
To: Bill Jay
Cc: u...@spark.incubator.apache.orgmailto:u...@spark.incubator.apache.org
Subject: Re: Spark streaming cannot receive any message from Kafka

Bill,

However, when I am currently using Spark 1.1.0. the Spark streaming job cannot 
receive any messages from Kafka. I have not made any change to the code.

Do you see any suspicious messages in the log output?

Tobias




Re: Spark streaming cannot receive any message from Kafka

2014-11-13 Thread Helena Edelson
I encounter no issues with streaming from kafka to spark in 1.1.0. Do you
perhaps have a version conflict?

Helena
On Nov 13, 2014 12:55 AM, Jay Vyas jayunit100.apa...@gmail.com wrote:

 Yup , very important that  n1 for spark streaming jobs, If local use
 local[2]

 The thing to remember is that your spark receiver will take a thread to
 itself and produce data , so u need another thread to consume it .

 In a cluster manager like yarn or mesos, the word thread Is not used
 anymore, I guess has different meaning- you need 2 or more free compute
 slots, and that should be guaranteed by looking to see how many free node
 managers are running etc.

 On Nov 12, 2014, at 7:53 PM, Shao, Saisai saisai.s...@intel.com wrote:

  Did you configure Spark master as local, it should be local[n], n  1
 for local mode. Beside there’s a Kafka wordcount example in Spark Streaming
 example, you can try that. I’ve tested with latest master, it’s OK.



 Thanks

 Jerry



 *From:* Tobias Pfeiffer [mailto:t...@preferred.jp t...@preferred.jp]
 *Sent:* Thursday, November 13, 2014 8:45 AM
 *To:* Bill Jay
 *Cc:* u...@spark.incubator.apache.org
 *Subject:* Re: Spark streaming cannot receive any message from Kafka



 Bill,



   However, when I am currently using Spark 1.1.0. the Spark streaming job
 cannot receive any messages from Kafka. I have not made any change to the
 code.



 Do you see any suspicious messages in the log output?



 Tobias






Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Bill Jay
Hi all,

I have a Spark streaming job which constantly receives messages from Kafka.
I was using Spark 1.0.2 and the job has been running for a month. However,
when I am currently using Spark 1.1.0. the Spark streaming job cannot
receive any messages from Kafka. I have not made any change to the code.
Below please find the code snippet for the Kafkaconsumer:

 var Array(zkQuorum, topics, mysqlTable) = args
val currentTime: Long = System.currentTimeMillis
val group = my-group- + currentTime.toString
println(topics)
println(zkQuorum)
val numThreads = 1
val topicMap = topics.split(,).map((_,numThreads.toInt)).toMap
ssc = new StreamingContext(conf, Seconds(batch))
ssc.checkpoint(hadoopOutput + checkpoint)
val lines = KafkaUtils.createStream(ssc, zkQuorum, group,
topicMap).map(_._2)
val lineCounts =
lines.count.saveAsTextFiles(hadoopOutput+counts/result)


I checked the values in topics and zkQuorum and they are correct. I use the
same information with kafka-console-consumer and it works correctly.

Does anyone know the reason? Thanks!

Bill


Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Bill Jay
Hi all,

I have a Spark streaming job which constantly receives messages from Kafka.
I was using Spark 1.0.2 and the job has been running for a month. However,
when I am currently using Spark 1.1.0. the Spark streaming job cannot
receive any messages from Kafka. I have not made any change to the code.
Below please find the code snippet for the Kafka consumer:

 var Array(zkQuorum, topics, mysqlTable) = args
val currentTime: Long = System.currentTimeMillis
val group = my-group- + currentTime.toString
println(topics)
println(zkQuorum)
val numThreads = 1
val topicMap = topics.split(,).map((_,numThreads.toInt)).toMap
ssc = new StreamingContext(conf, Seconds(batch))
ssc.checkpoint(hadoopOutput + checkpoint)
val lines = KafkaUtils.createStream(ssc, zkQuorum, group,
topicMap).map(_._2)
val lineCounts =
lines.count.saveAsTextFiles(hadoopOutput+counts/result)


I checked the values in topics and zkQuorum and they are correct. I use the
same information with kafka-console-consumer and it works correctly.

Does anyone know the reason? Thanks!

Bill


Re: Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Tobias Pfeiffer
Bill,

However, when I am currently using Spark 1.1.0. the Spark streaming job
 cannot receive any messages from Kafka. I have not made any change to the
 code.


Do you see any suspicious messages in the log output?

Tobias


Re: Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Bill Jay
Hi all,

Thanks for the information. I am running Spark streaming in a yarn cluster
and the configuration should be correct. I followed the KafkaWordCount to
write the current code three months ago. It has been working for several
months. The messages are in json format. Actually, this code worked a few
days ago. But now it is not working. Below please find my spark submit
script:


SPARK_BIN=/home/hadoop/spark/bin/
$SPARK_BIN/spark-submit \
 --class com.test \
 --master yarn-cluster \
 --deploy-mode cluster \
 --verbose \
 --driver-memory 20G \
 --executor-memory 20G \
 --executor-cores 6 \
 --num-executors $2 \
 $1 $3 $4 $5

Thanks!

 Bill

On Wed, Nov 12, 2014 at 4:53 PM, Shao, Saisai saisai.s...@intel.com wrote:

  Did you configure Spark master as local, it should be local[n], n  1
 for local mode. Beside there’s a Kafka wordcount example in Spark Streaming
 example, you can try that. I’ve tested with latest master, it’s OK.



 Thanks

 Jerry



 *From:* Tobias Pfeiffer [mailto:t...@preferred.jp]
 *Sent:* Thursday, November 13, 2014 8:45 AM
 *To:* Bill Jay
 *Cc:* u...@spark.incubator.apache.org
 *Subject:* Re: Spark streaming cannot receive any message from Kafka



 Bill,



   However, when I am currently using Spark 1.1.0. the Spark streaming job
 cannot receive any messages from Kafka. I have not made any change to the
 code.



 Do you see any suspicious messages in the log output?



 Tobias





RE: Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Shao, Saisai
Did you configure Spark master as local, it should be local[n], n  1 for local 
mode. Beside there’s a Kafka wordcount example in Spark Streaming example, you 
can try that. I’ve tested with latest master, it’s OK.

Thanks
Jerry

From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Thursday, November 13, 2014 8:45 AM
To: Bill Jay
Cc: u...@spark.incubator.apache.org
Subject: Re: Spark streaming cannot receive any message from Kafka

Bill,

However, when I am currently using Spark 1.1.0. the Spark streaming job cannot 
receive any messages from Kafka. I have not made any change to the code.

Do you see any suspicious messages in the log output?

Tobias



Re: Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Jay Vyas
Yup , very important that  n1 for spark streaming jobs, If local use 
local[2] 

The thing to remember is that your spark receiver will take a thread to itself 
and produce data , so u need another thread to consume it .

In a cluster manager like yarn or mesos, the word thread Is not used anymore, I 
guess has different meaning- you need 2 or more free compute slots, and that 
should be guaranteed by looking to see how many free node managers are running 
etc.

 On Nov 12, 2014, at 7:53 PM, Shao, Saisai saisai.s...@intel.com wrote:
 
 Did you configure Spark master as local, it should be local[n], n  1 for 
 local mode. Beside there’s a Kafka wordcount example in Spark Streaming 
 example, you can try that. I’ve tested with latest master, it’s OK.
  
 Thanks
 Jerry
  
 From: Tobias Pfeiffer [mailto:t...@preferred.jp] 
 Sent: Thursday, November 13, 2014 8:45 AM
 To: Bill Jay
 Cc: u...@spark.incubator.apache.org
 Subject: Re: Spark streaming cannot receive any message from Kafka
  
 Bill,
  
 However, when I am currently using Spark 1.1.0. the Spark streaming job 
 cannot receive any messages from Kafka. I have not made any change to the 
 code.
  
 Do you see any suspicious messages in the log output?
  
 Tobias