@koeninger.org>
> *Sent:* Thursday, October 1, 2015 11:46 PM
> *To:* Sourabh Chandak
> *Cc:* user
> *Subject:* Re: spark.streaming.kafka.maxRatePerPartition for direct stream
>
> That depends on your job, your cluster resources, the number of seconds
> per batch...
>
&g
Hi,
I have a receiverless kafka streaming job which was started yesterday
evening and was running fine till 4 PM today. Suddenly post that writing of
checkpoint has slowed down and it is now not able to catch up with the
incoming data. We are using the DSE stack with Spark 1.2 and Cassandra for
> Why are you sure it's checkpointing speed?
>
> Have you compared it against checkpointing to hdfs, s3, or local disk?
>
> On Fri, Oct 2, 2015 at 1:17 AM, Sourabh Chandak <sourabh3...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a receiverless kafka s
Tried using local checkpointing as well, and even that becomes slow after
sometime. Any idea what can be wrong?
Thanks,
Sourabh
On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak <sourabh3...@gmail.com>
wrote:
> I can see the entries processed in the table very fast but after that i
ata), or RDD checkpointing
> (which saves the actual intermediate RDD data)
>
> TD
>
> On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak <sourabh3...@gmail.com
> <javascript:_e(%7B%7D,'cvml','sourabh3...@gmail.com');>> wrote:
>
>> Tried using local checkpointing as well
Hi,
I am writing a spark streaming job using the direct stream method for kafka
and wanted to handle the case of checkpoint failure when we'll have to
reprocess the entire data from starting. By default for every new
checkpoint it tries to load everything from each partition and that takes a
lot
I also have the same use case as Augustus, and have some basic questions
about recovery from checkpoint. I have a 10 node Kafka cluster and a 30
node Spark cluster running streaming job, how is the (topic, partition)
data handled in checkpointing. The scenario I want to understand is, in
case of
ing("Throwing this errir\n")),
ok => ok
)
}
On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak <sourabh3...@gmail.com>
wrote:
> I was able to get pass this issue. I was pointing the SSL port whereas
> SimpleConsumer should point to the PLAINTEXT port. But after fixing that
river memory, or put a profiler on it to see what's taking
> up heap.
>
>
>
> On Thu, Sep 24, 2015 at 3:51 PM, Sourabh Chandak <sourabh3...@gmail.com>
> wrote:
>
>> Adding Cody and Sriharsha
>>
>> On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak <sourab
Hi,
I have ported receiver less spark streaming for kafka to Spark 1.2 and am
trying to run a spark streaming job to consume data form my broker, but I
am getting the following error:
15/09/24 20:17:45 ERROR BoundedByteBufferReceive: OOME with size 352518400
java.lang.OutOfMemoryError: Java heap
Adding Cody and Sriharsha
On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak <sourabh3...@gmail.com>
wrote:
> Hi,
>
> I have ported receiver less spark streaming for kafka to Spark 1.2 and am
> trying to run a spark streaming job to consume data form my broker, but I
> am
Can we use the existing kafka spark streaming jar to connect to a kafka
server running in SSL mode?
We are fine with non SSL consumer as our kafka cluster and spark cluster
are in the same network
Thanks,
Sourabh
On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira g...@confluent.io wrote:
I can't
Kafka approach. That is quite flexible, can
give exactly-once guarantee without WAL, and is more robust and performant.
Consider using it.
On Wed, Aug 5, 2015 at 5:48 PM, Sourabh Chandak sourabh3...@gmail.com
wrote:
Hi,
I am trying to replicate the Kafka Streaming Receiver for a custom
Hi,
I am trying to replicate the Kafka Streaming Receiver for a custom version
of Kafka and want to create a Reliable receiver. The current implementation
uses BlockGenerator which is a private class inside Spark streaming hence I
can't use that in my code. Can someone help me with some resources
14 matches
Mail list logo