Re: Spark structured streaming -Kafka - deployment / monitor and restart
In SS, checkpointing is now a part of running micro-batch and it's supported natively. (making clear, my library doesn't deal with the native behavior of checkpointing) In other words, it can't be customized like you have been doing with your database. You probably don't need to do it with SS, but it still depends on what you did with the offsets in the database. On Tue, Jul 7, 2020 at 1:40 AM KhajaAsmath Mohammed wrote: > Thanks Lim, this is really helpful. I have few questions. > > Our earlier approach used low level customer to read offsets from database > and use those information to read using spark streaming in Dstreams. Save > the offsets back once the process is finished. This way we never lost data. > > with your library, will it automatically process from the last offset it > processed when the application was stopped or killed for some time. > > Thanks, > Asmath > > On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim > wrote: > >> There're sections in SS programming guide which exactly answer these >> questions: >> >> >> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries >> >> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries >> >> Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm >> the author) to help you commit the offset to Kafka with the specific group >> ID. >> >> https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer >> >> After then, you can also leverage the Kafka ecosystem to monitor the >> progress in point of Kafka's view, especially the gap between highest >> offset and committed offset. >> >> Hope this helps. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> >> On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi >> wrote: >> >>> In 3.0 the community just added it. >>> >>> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, >>> wrote: >>> Hi, We are trying to move our existing code from spark dstreams to structured streaming for one of the old application which we built few years ago. Structured streaming job doesn’t have streaming tab in sparkui. Is there a way to monitor the job submitted by us in structured streaming ? Since the job runs for every trigger, how can we kill the job and restart if needed. Any suggestions on this please Thanks, Asmath - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark structured streaming -Kafka - deployment / monitor and restart
Thanks Lim, this is really helpful. I have few questions. Our earlier approach used low level customer to read offsets from database and use those information to read using spark streaming in Dstreams. Save the offsets back once the process is finished. This way we never lost data. with your library, will it automatically process from the last offset it processed when the application was stopped or killed for some time. Thanks, Asmath On Sun, Jul 5, 2020 at 6:22 PM Jungtaek Lim wrote: > There're sections in SS programming guide which exactly answer these > questions: > > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries > > http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries > > Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm > the author) to help you commit the offset to Kafka with the specific group > ID. > > https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer > > After then, you can also leverage the Kafka ecosystem to monitor the > progress in point of Kafka's view, especially the gap between highest > offset and committed offset. > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi > wrote: > >> In 3.0 the community just added it. >> >> On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, >> wrote: >> >>> Hi, >>> >>> We are trying to move our existing code from spark dstreams to >>> structured streaming for one of the old application which we built few >>> years ago. >>> >>> Structured streaming job doesn’t have streaming tab in sparkui. Is there >>> a way to monitor the job submitted by us in structured streaming ? Since >>> the job runs for every trigger, how can we kill the job and restart if >>> needed. >>> >>> Any suggestions on this please >>> >>> Thanks, >>> Asmath >>> >>> >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>>
Re: Spark structured streaming -Kafka - deployment / monitor and restart
There're sections in SS programming guide which exactly answer these questions: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries Also, for Kafka data source, there's a 3rd party project (DISCLAIMER: I'm the author) to help you commit the offset to Kafka with the specific group ID. https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer After then, you can also leverage the Kafka ecosystem to monitor the progress in point of Kafka's view, especially the gap between highest offset and committed offset. Hope this helps. Thanks, Jungtaek Lim (HeartSaVioR) On Mon, Jul 6, 2020 at 2:53 AM Gabor Somogyi wrote: > In 3.0 the community just added it. > > On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, > wrote: > >> Hi, >> >> We are trying to move our existing code from spark dstreams to structured >> streaming for one of the old application which we built few years ago. >> >> Structured streaming job doesn’t have streaming tab in sparkui. Is there >> a way to monitor the job submitted by us in structured streaming ? Since >> the job runs for every trigger, how can we kill the job and restart if >> needed. >> >> Any suggestions on this please >> >> Thanks, >> Asmath >> >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>
Re: Spark structured streaming -Kafka - deployment / monitor and restart
In 3.0 the community just added it. On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed, wrote: > Hi, > > We are trying to move our existing code from spark dstreams to structured > streaming for one of the old application which we built few years ago. > > Structured streaming job doesn’t have streaming tab in sparkui. Is there a > way to monitor the job submitted by us in structured streaming ? Since the > job runs for every trigger, how can we kill the job and restart if needed. > > Any suggestions on this please > > Thanks, > Asmath > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >