Re: Spark Streaming with Kafka Use Case

Cody Koeninger Thu, 18 Feb 2016 08:31:28 -0800

If by smaller block interval you mean the value in seconds passed to the
streaming context constructor, no.  You'll still get everything from the
starting offset until now in the first batch.


On Thu, Feb 18, 2016 at 10:02 AM, praveen S <mylogi...@gmail.com> wrote:

> Sorry.. Rephrasing :
> Can this issue be resolved by having a smaller block interval?
>
> Regards,
> Praveen
> On 18 Feb 2016 21:30, "praveen S" <mylogi...@gmail.com> wrote:
>
>> Can having a smaller block interval only resolve this?
>>
>> Regards,
>> Praveen
>> On 18 Feb 2016 21:13, "Cody Koeninger" <c...@koeninger.org> wrote:
>>
>>> Backpressure won't help you with the first batch, you'd need 
>>> spark.streaming.kafka.maxRatePerPartition
>>> for that
>>>
>>> On Thu, Feb 18, 2016 at 9:40 AM, praveen S <mylogi...@gmail.com> wrote:
>>>
>>>> Have a look at
>>>>
>>>> spark.streaming.backpressure.enabled
>>>> Property
>>>>
>>>> Regards,
>>>> Praveen
>>>> On 18 Feb 2016 00:13, "Abhishek Anand" <abhis.anan...@gmail.com> wrote:
>>>>
>>>>> I have a spark streaming application running in production. I am
>>>>> trying to find a solution for a particular use case when my application 
>>>>> has
>>>>> a downtime of say 5 hours and is restarted. Now, when I start my streaming
>>>>> application after 5 hours there would be considerable amount of data then
>>>>> in the Kafka and my cluster would be unable to repartition and process 
>>>>> that.
>>>>>
>>>>> Is there any workaround so that when my streaming application starts
>>>>> it starts taking data for 1-2 hours, process it , then take the data for
>>>>> next 1 hour process it. Now when its done processing of previous 5 hours
>>>>> data which missed, normal streaming should start with the given slide
>>>>> interval.
>>>>>
>>>>> Please suggest any ideas and feasibility of this.
>>>>>
>>>>>
>>>>> Thanks !!
>>>>> Abhi
>>>>>
>>>>
>>>

Re: Spark Streaming with Kafka Use Case

Reply via email to