Re: spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-02 Thread Sourabh Chandak
@koeninger.org> > *Sent:* Thursday, October 1, 2015 11:46 PM > *To:* Sourabh Chandak > *Cc:* user > *Subject:* Re: spark.streaming.kafka.maxRatePerPartition for direct stream > > That depends on your job, your cluster resources, the number of seconds > per batch... > &g

Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Hi, I have a receiverless kafka streaming job which was started yesterday evening and was running fine till 4 PM today. Suddenly post that writing of checkpoint has slowed down and it is now not able to catch up with the incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
> Why are you sure it's checkpointing speed? > > Have you compared it against checkpointing to hdfs, s3, or local disk? > > On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak <sourabh3...@gmail.com> > wrote: > >> Hi, >> >> I have a receiverless kafka s

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Tried using local checkpointing as well, and even that becomes slow after sometime. Any idea what can be wrong? Thanks, Sourabh On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > I can see the entries processed in the table very fast but after that i

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
ata), or RDD checkpointing > (which saves the actual intermediate RDD data) > > TD > > On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak <sourabh3...@gmail.com > <javascript:_e(%7B%7D,'cvml','sourabh3...@gmail.com');>> wrote: > >> Tried using local checkpointing as well

spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
Hi, I am writing a spark streaming job using the direct stream method for kafka and wanted to handle the case of checkpoint failure when we'll have to reprocess the entire data from starting. By default for every new checkpoint it tries to load everything from each partition and that takes a lot

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Sourabh Chandak
I also have the same use case as Augustus, and have some basic questions about recovery from checkpoint. I have a 10 node Kafka cluster and a 30 node Spark cluster running streaming job, how is the (topic, partition) data handled in checkpointing. The scenario I want to understand is, in case of

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
ing("Throwing this errir\n")), ok => ok ) } On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > I was able to get pass this issue. I was pointing the SSL port whereas > SimpleConsumer should point to the PLAINTEXT port. But after fixing that

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
river memory, or put a profiler on it to see what's taking > up heap. > > > > On Thu, Sep 24, 2015 at 3:51 PM, Sourabh Chandak <sourabh3...@gmail.com> > wrote: > >> Adding Cody and Sriharsha >> >> On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak <sourab

ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Hi, I have ported receiver less spark streaming for kafka to Spark 1.2 and am trying to run a spark streaming job to consume data form my broker, but I am getting the following error: 15/09/24 20:17:45 ERROR BoundedByteBufferReceive: OOME with size 352518400 java.lang.OutOfMemoryError: Java heap

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Adding Cody and Sriharsha On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > Hi, > > I have ported receiver less spark streaming for kafka to Spark 1.2 and am > trying to run a spark streaming job to consume data form my broker, but I > am

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Sourabh Chandak
Can we use the existing kafka spark streaming jar to connect to a kafka server running in SSL mode? We are fine with non SSL consumer as our kafka cluster and spark cluster are in the same network Thanks, Sourabh On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira g...@confluent.io wrote: I can't

Re: Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
Kafka approach. That is quite flexible, can give exactly-once guarantee without WAL, and is more robust and performant. Consider using it. On Wed, Aug 5, 2015 at 5:48 PM, Sourabh Chandak sourabh3...@gmail.com wrote: Hi, I am trying to replicate the Kafka Streaming Receiver for a custom

Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
Hi, I am trying to replicate the Kafka Streaming Receiver for a custom version of Kafka and want to create a Reliable receiver. The current implementation uses BlockGenerator which is a private class inside Spark streaming hence I can't use that in my code. Can someone help me with some resources