Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Shahbaz
- Do you happen to see how busy are the nodes in terms of CPU and how much heap each executor is allocated with. - If there is enough capacity ,you may want to increase number of cores per executor to 2 and do the needed heap tweaking. - How much time did it take to process 4M+

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
I have 2 machines in my cluster with the below specifications: 128 GB RAM and 8 cores machine Regards, ~Vinti On Sun, Mar 6, 2016 at 7:54 AM, Vinti Maheshwari wrote: > Thanks Supreeth and Shahbaz. I will try adding > spark.streaming.kafka.maxRatePerPartition. > > Hi

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
Thanks Supreeth and Shahbaz. I will try adding spark.streaming.kafka.maxRatePerPartition. Hi Shahbaz, Please see comments, inline: - Which version of Spark you are using. ==> *1.5.2* - How big is the Kafka Cluster ==> *2 brokers* - What is the Message Size and type.==> *String, 9,550

Re: Spark + Kafka all messages being used in 1 batch

2016-03-05 Thread Supreeth
Try setting spark.streaming.kafka.maxRatePerPartition, this can help control the number of messages read from Kafka per partition on the spark streaming consumer. -S > On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari wrote: > > Hello, > > I am trying to figure out why my

Spark + Kafka all messages being used in 1 batch

2016-03-05 Thread Vinti Maheshwari
Hello, I am trying to figure out why my kafka+spark job is running slow. I found that spark is consuming all the messages out of kafka into a single batch itself and not sending any messages to the other batches. 2016/03/05 21:57:05