Re: Spark Streaming + Kafka + scala job message read issue

vivek.meghanathan Fri, 15 Jan 2016 10:37:41 -0800

All,
The issue was related to apache Cassandra. I have changed the Cassandra to 
datastax Cassandra and the issue is resolved. Also I have changed the spark 
version to 1.3.


There is some serious issue is there between spark Cassandra connector and 
apache Cassandra 2.1+ while using in spark streaming jobs.

Regards
Vivek

On Tue, Jan 05, 2016 at 4:38 pm, Vivek Meghanathan (WT01 - NEP) 
<vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>> wrote:

Hello All,

After investigating further using a test program, we were able to read the 
kafka input messages using spark streaming.

Once we add a particular line which performs map and reduce – and groupByKey 
(all written in single line), we are not seeing the input message details in 
the logs. We have increased the batch interval to 5 seconds and removed the 
numtasks (it was defined as 10) . Once we made this change the kafka messages 
started to get processed . But it takes long time to process.

This works fine in our local lab server but the problem in the google compute 
engine server. The local lab server is low in spec 8 cpu with 8GB ram but the 
cloud server is high memory one 30GB RAM and 8 CPU. As far as I could see the 
execution happens much faster in google platform but somehow the job processing 
getting messed up.

Any suggestions?


Regards,
Vivek M



From: Vivek Meghanathan (WT01 - NEP)
Sent: 27 December 2015 11:08
To: Bryan <bryan.jeff...@gmail.com>
Cc: Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; 
duc.was.h...@gmail.com; user@spark.apache.org
Subject: Re: Spark Streaming + Kafka + scala job message read issue


Hi Bryan,
Yes we are using only 1 thread per topic as we have only one Kafka server with 
1 partition.
What kind of logs will tell us what offset spark stream is reading from Kafka 
or is it resetting something without reading?

Regards
Vivek


Sent using CloudMagic 
Email<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.0.67&pv=5.1.1&source=email_footer_2>
On Sun, Dec 27, 2015 at 12:03 am, Bryan 
<bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote:

Vivek,

Where you’re using numThreads – look at the documentation for createStream. I 
believe that number should be the number of partitions to consume.

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for 
Windows 10 phone


From: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>
Sent: Friday, December 25, 2015 11:39 PM
To: bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>
Cc: duc.was.h...@gmail.com<mailto:duc.was.h...@gmail.com>; 
vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark Streaming + Kafka + scala job message read issue


Hi Brian,PhuDuc,

All 8 jobs are consuming 8 different IN topics. 8 different Scala jobs running 
each topic map mentioned below has only 1 thread number mentioned. In this case 
group should not be a problem right.

Here is the complete flow, spring MVC sends in messages to Kafka , spark 
streaming reading that and sends message back to Kafka, some cases they will 
update data to Cassandra only. Spring the response messages.
I could see the message is always reaching Kafka (checked through the console 
consumer).

Regards
Vivek


Sent using CloudMagic 
Email<https://cloudmagic.com/k/d/mailapp?ct=pa&cv=8.0.67&pv=5.1.1&source=email_footer_2>
On Sat, Dec 26, 2015 at 2:42 am, Bryan 
<bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote:

Agreed. I did not see that they were using the same group name.

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for 
Windows 10 phone


From: PhuDuc Nguyen<mailto:duc.was.h...@gmail.com>
Sent: Friday, December 25, 2015 3:35 PM
To: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark Streaming + Kafka + scala job message read issue

Vivek,

Did you say you have 8 spark jobs that are consuming from the same topic and 
all jobs are using the same consumer group name? If so, each job would get a 
subset of messages from that kafka topic, ie each job would get 1 out of 8 
messages from that topic. Is that your intent?

regards,
Duc






On Thu, Dec 24, 2015 at 7:20 AM, 
<vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>> wrote:
We are using the older receiver based approach, the number of partitions is 1 
(we have a single node kafka) and we use single thread per topic still we have 
the problem. Please see the API we use. All 8 spark jobs use same group name – 
is that a problem?

val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap  - Number of 
threads used here is 1
val searches = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(line 
=> parse(line._2).extract[Search])


Regards,
Vivek M
From: Bryan [mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>]
Sent: 24 December 2015 17:20
To: Vivek Meghanathan (WT01 - NEP) 
<vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>>; 
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Spark Streaming + Kafka + scala job message read issue

Are you using a direct stream consumer, or the older receiver based consumer? 
If the latter, do the number of partitions you’ve specified for your topic 
match the number of partitions in the topic on Kafka?

That would be an possible cause – as you might receive all data from a given 
partition while missing data from other partitions.

Regards,

Bryan Jeffrey

Sent from Outlook Mail<http://go.microsoft.com/fwlink/?LinkId=550987> for 
Windows 10 phone


From: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>
Sent: Thursday, December 24, 2015 5:22 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark Streaming + Kafka + scala job message read issue

Hi All,



We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our 
spark streaming job(consumer) not receiving all the messages sent to the 
specific topic. It receives 1 out of ~50 messages(added log in the job stream 
and identified). We are not seeing any errors in the kafka logs. Unable to 
debug further from kafka layer. The console consumer shows the INPUT topic is 
received in the console. it is not reaching the spark-kafka integration stream. 
Any thoughts how to debug this issue. Another topic is working fine in same 
setup.

Again tried with spark 1.3.0, kafka 0.8.1.1 which is also has same issue. All 
these jobs are working fine in our local lab servers

Regards,
Vivek M
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>


The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

Re: Spark Streaming + Kafka + scala job message read issue

Reply via email to