Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-06 Thread Jungtaek Lim
In SS, checkpointing is now a part of running micro-batch and it's
supported natively. (making clear, my library doesn't deal with the native
behavior of checkpointing)

In other words, it can't be customized like you have been doing with your
database. You probably don't need to do it with SS, but it still depends on
what you did with the offsets in the database.

On Tue, Jul 7, 2020 at 1:40 AM KhajaAsmath Mohammed 
wrote:

> Thanks Lim, this is really helpful. I have few questions.
>
> Our earlier approach used low level customer to read offsets from database
> and use those information to read using spark streaming in Dstreams. Save
> the offsets back once the process is finished. This way we never lost data.
>
> with your library, will it automatically process from the last offset it
> processed when the application was stopped or killed for some time.
>
> Thanks,
> Asmath
>
> On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim 
> wrote:
>
>> There're sections in SS programming guide which exactly answer these
>> questions:
>>
>>
>> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries
>>
>> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
>>
>> Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm
>> the author) to help you commit the offset to Kafka with the specific group
>> ID.
>>
>> https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer
>>
>> After then, you can also leverage the Kafka ecosystem to monitor the
>> progress in point of Kafka's view, especially the gap between highest
>> offset and committed offset.
>>
>> Hope this helps.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>> On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi 
>> wrote:
>>
>>> In 3.0 the community just added it.
>>>
>>> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
>>> wrote:
>>>
 Hi,

 We are trying to move our existing code from spark dstreams to
 structured streaming for one of the old application which we built few
 years ago.

 Structured streaming job doesn’t have streaming tab in sparkui. Is
 there a way to monitor the job submitted by us in structured streaming ?
 Since the job runs for every trigger, how can we kill the job and restart
 if needed.

 Any suggestions on this please

 Thanks,
 Asmath



 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org




Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-06 Thread KhajaAsmath Mohammed
Thanks Lim, this is really helpful. I have few questions.

Our earlier approach used low level customer to read offsets from database
and use those information to read using spark streaming in Dstreams. Save
the offsets back once the process is finished. This way we never lost data.

with your library, will it automatically process from the last offset it
processed when the application was stopped or killed for some time.

Thanks,
Asmath

On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim 
wrote:

> There're sections in SS programming guide which exactly answer these
> questions:
>
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
>
> Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm
> the author) to help you commit the offset to Kafka with the specific group
> ID.
>
> https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer
>
> After then, you can also leverage the Kafka ecosystem to monitor the
> progress in point of Kafka's view, especially the gap between highest
> offset and committed offset.
>
> Hope this helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi 
> wrote:
>
>> In 3.0 the community just added it.
>>
>> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
>> wrote:
>>
>>> Hi,
>>>
>>> We are trying to move our existing code from spark dstreams to
>>> structured streaming for one of the old application which we built few
>>> years ago.
>>>
>>> Structured streaming job doesn’t have streaming tab in sparkui. Is there
>>> a way to monitor the job submitted by us in structured streaming ? Since
>>> the job runs for every trigger, how can we kill the job and restart if
>>> needed.
>>>
>>> Any suggestions on this please
>>>
>>> Thanks,
>>> Asmath
>>>
>>>
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>


Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-05 Thread Jungtaek Lim
There're sections in SS programming guide which exactly answer these
questions:

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries

Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm
the author) to help you commit the offset to Kafka with the specific group
ID.

https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer

After then, you can also leverage the Kafka ecosystem to monitor the
progress in point of Kafka's view, especially the gap between highest
offset and committed offset.

Hope this helps.

Thanks,
Jungtaek Lim (HeartSaVioR)


On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi 
wrote:

> In 3.0 the community just added it.
>
> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
> wrote:
>
>> Hi,
>>
>> We are trying to move our existing code from spark dstreams to structured
>> streaming for one of the old application which we built few years ago.
>>
>> Structured streaming job doesn’t have streaming tab in sparkui. Is there
>> a way to monitor the job submitted by us in structured streaming ? Since
>> the job runs for every trigger, how can we kill the job and restart if
>> needed.
>>
>> Any suggestions on this please
>>
>> Thanks,
>> Asmath
>>
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: Spark structured streaming -Kafka - deployment / monitor and restart

2020-07-05 Thread Gabor Somogyi
In 3.0 the community just added it.

On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, 
wrote:

> Hi,
>
> We are trying to move our existing code from spark dstreams to structured
> streaming for one of the old application which we built few years ago.
>
> Structured streaming job doesn’t have streaming tab in sparkui. Is there a
> way to monitor the job submitted by us in structured streaming ? Since the
> job runs for every trigger, how can we kill the job and restart if needed.
>
> Any suggestions on this please
>
> Thanks,
> Asmath
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>