Checkpointing with Kafka streaming

2016-02-22 Thread p pathiyil
Hi, While using Kafka streaming with the direct API, does the checkpoint consist of more than the Kafka offset if no 'window' operations are being used ? Is there any facility to check the contents of the checkpoint files ? Thanks.

Fair Scheduler Pools with Kafka Streaming

2016-02-16 Thread p pathiyil
Hi, I am trying to use Fair Scheduler Pools with Kafka Streaming. I am assigning each Kafka partition to its own pool. The attempt is to give each partition an equal share of compute time irrespective of the number of messages in each time window for each partition. However, I do not see fair

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-12 Thread p pathiyil
n Piu" <sebastian@gmail.com> wrote: > Have you tried using fair scheduler and queues > On 12 Feb 2016 4:24 a.m., "p pathiyil" <pathi...@gmail.com> wrote: > >> With this setting, I can see that the next job is being executed before >> the previous

Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
Hi, I am looking at a way to isolate the processing of messages from each Kafka partition within the same driver. Scenario: A DStream is created with the createDirectStream call by passing in a few partitions. Let us say that the streaming context is defined to have a time duration of 2 seconds.

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
how-jobs-are-assigned-to-executors-in-spark-streaming > > > On Thu, Feb 11, 2016 at 9:33 AM, p pathiyil <pathi...@gmail.com> wrote: > >> Thanks for the response Cody. >> >> The producers are out of my control, so can't really balance the incoming >> content

Spark Streaming with Kafka - batch DStreams in memory

2016-02-01 Thread p pathiyil
Hi, Are there any ways to store DStreams / RDD read from Kafka in memory to be processed at a later time ? What we need to do is to read data from Kafka, process it to be keyed by some attribute that is present in the Kafka messages, and write out the data related to each key when we have