Re: kafka: how to stop consumption temporarily
Awesome ! I gonna implement it. Thanks a lot Arvid. Le mer. 8 janv. 2020 à 12:00, Arvid Heise a écrit : > I'd second Chesnay's suggestion to use a custom source. It would be a > piece of cake with FLIP-27 [1], but we are not there yet unfortunately. > It's probably in Flink 1.11 (mid year) if you can wait. > > The current way would be a source that wraps the two KafkaConsumer and > blocks the normal consumer from outputting elements. Here is a quick and > dirty solution that I threw together: > https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd . > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface > > On Mon, Jan 6, 2020 at 1:16 PM David Morin > wrote: > >> My naive solution can't work because a dump can be quite long. >> So, yes I have to find a way to stop the consumption from the topic used >> for streaming mode when a dump is done :( >> Terry, I try to implement something based on your reply and based on this >> thread >> https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a >> Any suggestions are welcomed >> thx. >> >> David >> >> On 2020/01/06 09:35:37, David Morin wrote: >> > Hi, >> > >> > Thanks for your replies. >> > Yes Terry. You are right. I can try to create a custom source. >> > But perhaps, according to my use case, I figured out I can use a >> technical field in my data. This is a timestamp and I think I just have to >> ignore late events with watermarks or later in the pipeline according to >> metadata stored in the Flink state. I test it now... >> > Thx >> > >> > David >> > >> > On 2020/01/03 15:44:08, Chesnay Schepler wrote: >> > > Are you asking how to detect from within the job whether the dump is >> > > complete, or how to combine these 2 jobs? >> > > >> > > If you had a way to notice whether the dump is complete, then I would >> > > suggest to create a custom source that wraps 2 kafka sources, and >> switch >> > > between them at will based on your conditions. >> > > >> > > >> > > On 03/01/2020 03:53, Terry Wang wrote: >> > > > Hi, >> > > > >> > > > I’d like to share my opinion here. It seems that you need adjust >> the Kafka consumer to have communication each other. When your begin the >> dump process, you need to notify another CDC-topic consumer to wait idle. >> > > > >> > > > >> > > > Best, >> > > > Terry Wang >> > > > >> > > > >> > > > >> > > >> 2020年1月2日 16:49,David Morin 写道: >> > > >> >> > > >> Hi, >> > > >> >> > > >> Is there a way to stop temporarily to consume one kafka source in >> streaming mode ? >> > > >> Use case: I have to consume 2 topics but in fact one of them is >> more prioritized. >> > > >> One of this topic is dedicated to ingest data from db (change data >> capture) and one of them is dedicated to make a synchronization (a dump >> i.e. a SELECT ... from db). At the moment the last one is performed by one >> Flink job and we start this one after stop the previous one (CDC) manually >> > > >> I want to merge these 2 modes and automatically stop consumption >> of the topic dedicated to the CDC mode when a dump is done. >> > > >> How to handle that with Flink in a streaming way ? backpressure ? >> ... >> > > >> Thx in advance for your insights >> > > >> >> > > >> David >> > > > >> > > >> > > >> > >> >
Re: kafka: how to stop consumption temporarily
I'd second Chesnay's suggestion to use a custom source. It would be a piece of cake with FLIP-27 [1], but we are not there yet unfortunately. It's probably in Flink 1.11 (mid year) if you can wait. The current way would be a source that wraps the two KafkaConsumer and blocks the normal consumer from outputting elements. Here is a quick and dirty solution that I threw together: https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd . [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface On Mon, Jan 6, 2020 at 1:16 PM David Morin wrote: > My naive solution can't work because a dump can be quite long. > So, yes I have to find a way to stop the consumption from the topic used > for streaming mode when a dump is done :( > Terry, I try to implement something based on your reply and based on this > thread > https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a > Any suggestions are welcomed > thx. > > David > > On 2020/01/06 09:35:37, David Morin wrote: > > Hi, > > > > Thanks for your replies. > > Yes Terry. You are right. I can try to create a custom source. > > But perhaps, according to my use case, I figured out I can use a > technical field in my data. This is a timestamp and I think I just have to > ignore late events with watermarks or later in the pipeline according to > metadata stored in the Flink state. I test it now... > > Thx > > > > David > > > > On 2020/01/03 15:44:08, Chesnay Schepler wrote: > > > Are you asking how to detect from within the job whether the dump is > > > complete, or how to combine these 2 jobs? > > > > > > If you had a way to notice whether the dump is complete, then I would > > > suggest to create a custom source that wraps 2 kafka sources, and > switch > > > between them at will based on your conditions. > > > > > > > > > On 03/01/2020 03:53, Terry Wang wrote: > > > > Hi, > > > > > > > > I’d like to share my opinion here. It seems that you need adjust the > Kafka consumer to have communication each other. When your begin the dump > process, you need to notify another CDC-topic consumer to wait idle. > > > > > > > > > > > > Best, > > > > Terry Wang > > > > > > > > > > > > > > > >> 2020年1月2日 16:49,David Morin 写道: > > > >> > > > >> Hi, > > > >> > > > >> Is there a way to stop temporarily to consume one kafka source in > streaming mode ? > > > >> Use case: I have to consume 2 topics but in fact one of them is > more prioritized. > > > >> One of this topic is dedicated to ingest data from db (change data > capture) and one of them is dedicated to make a synchronization (a dump > i.e. a SELECT ... from db). At the moment the last one is performed by one > Flink job and we start this one after stop the previous one (CDC) manually > > > >> I want to merge these 2 modes and automatically stop consumption of > the topic dedicated to the CDC mode when a dump is done. > > > >> How to handle that with Flink in a streaming way ? backpressure ? > ... > > > >> Thx in advance for your insights > > > >> > > > >> David > > > > > > > > > > > > >
Re: kafka: how to stop consumption temporarily
My naive solution can't work because a dump can be quite long. So, yes I have to find a way to stop the consumption from the topic used for streaming mode when a dump is done :( Terry, I try to implement something based on your reply and based on this thread https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a Any suggestions are welcomed thx. David On 2020/01/06 09:35:37, David Morin wrote: > Hi, > > Thanks for your replies. > Yes Terry. You are right. I can try to create a custom source. > But perhaps, according to my use case, I figured out I can use a technical > field in my data. This is a timestamp and I think I just have to ignore late > events with watermarks or later in the pipeline according to metadata stored > in the Flink state. I test it now... > Thx > > David > > On 2020/01/03 15:44:08, Chesnay Schepler wrote: > > Are you asking how to detect from within the job whether the dump is > > complete, or how to combine these 2 jobs? > > > > If you had a way to notice whether the dump is complete, then I would > > suggest to create a custom source that wraps 2 kafka sources, and switch > > between them at will based on your conditions. > > > > > > On 03/01/2020 03:53, Terry Wang wrote: > > > Hi, > > > > > > I’d like to share my opinion here. It seems that you need adjust the > > > Kafka consumer to have communication each other. When your begin the dump > > > process, you need to notify another CDC-topic consumer to wait idle. > > > > > > > > > Best, > > > Terry Wang > > > > > > > > > > > >> 2020年1月2日 16:49,David Morin 写道: > > >> > > >> Hi, > > >> > > >> Is there a way to stop temporarily to consume one kafka source in > > >> streaming mode ? > > >> Use case: I have to consume 2 topics but in fact one of them is more > > >> prioritized. > > >> One of this topic is dedicated to ingest data from db (change data > > >> capture) and one of them is dedicated to make a synchronization (a dump > > >> i.e. a SELECT ... from db). At the moment the last one is performed by > > >> one Flink job and we start this one after stop the previous one (CDC) > > >> manually > > >> I want to merge these 2 modes and automatically stop consumption of the > > >> topic dedicated to the CDC mode when a dump is done. > > >> How to handle that with Flink in a streaming way ? backpressure ? ... > > >> Thx in advance for your insights > > >> > > >> David > > > > > > > >
Re: kafka: how to stop consumption temporarily
Hi, Thanks for your replies. Yes Terry. You are right. I can try to create a custom source. But perhaps, according to my use case, I figured out I can use a technical field in my data. This is a timestamp and I think I just have to ignore late events with watermarks or later in the pipeline according to metadata stored in the Flink state. I test it now... Thx David On 2020/01/03 15:44:08, Chesnay Schepler wrote: > Are you asking how to detect from within the job whether the dump is > complete, or how to combine these 2 jobs? > > If you had a way to notice whether the dump is complete, then I would > suggest to create a custom source that wraps 2 kafka sources, and switch > between them at will based on your conditions. > > > On 03/01/2020 03:53, Terry Wang wrote: > > Hi, > > > > I’d like to share my opinion here. It seems that you need adjust the Kafka > > consumer to have communication each other. When your begin the dump > > process, you need to notify another CDC-topic consumer to wait idle. > > > > > > Best, > > Terry Wang > > > > > > > >> 2020年1月2日 16:49,David Morin 写道: > >> > >> Hi, > >> > >> Is there a way to stop temporarily to consume one kafka source in > >> streaming mode ? > >> Use case: I have to consume 2 topics but in fact one of them is more > >> prioritized. > >> One of this topic is dedicated to ingest data from db (change data > >> capture) and one of them is dedicated to make a synchronization (a dump > >> i.e. a SELECT ... from db). At the moment the last one is performed by one > >> Flink job and we start this one after stop the previous one (CDC) manually > >> I want to merge these 2 modes and automatically stop consumption of the > >> topic dedicated to the CDC mode when a dump is done. > >> How to handle that with Flink in a streaming way ? backpressure ? ... > >> Thx in advance for your insights > >> > >> David > > > >
Re: kafka: how to stop consumption temporarily
Are you asking how to detect from within the job whether the dump is complete, or how to combine these 2 jobs? If you had a way to notice whether the dump is complete, then I would suggest to create a custom source that wraps 2 kafka sources, and switch between them at will based on your conditions. On 03/01/2020 03:53, Terry Wang wrote: Hi, I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. Best, Terry Wang 2020年1月2日 16:49,David Morin 写道: Hi, Is there a way to stop temporarily to consume one kafka source in streaming mode ? Use case: I have to consume 2 topics but in fact one of them is more prioritized. One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. How to handle that with Flink in a streaming way ? backpressure ? ... Thx in advance for your insights David
Re: kafka: how to stop consumption temporarily
Hi, I’d like to share my opinion here. It seems that you need adjust the Kafka consumer to have communication each other. When your begin the dump process, you need to notify another CDC-topic consumer to wait idle. Best, Terry Wang > 2020年1月2日 16:49,David Morin 写道: > > Hi, > > Is there a way to stop temporarily to consume one kafka source in streaming > mode ? > Use case: I have to consume 2 topics but in fact one of them is more > prioritized. > One of this topic is dedicated to ingest data from db (change data capture) > and one of them is dedicated to make a synchronization (a dump i.e. a SELECT > ... from db). At the moment the last one is performed by one Flink job and we > start this one after stop the previous one (CDC) manually > I want to merge these 2 modes and automatically stop consumption of the topic > dedicated to the CDC mode when a dump is done. > How to handle that with Flink in a streaming way ? backpressure ? ... > Thx in advance for your insights > > David
kafka: how to stop consumption temporarily
Hi, Is there a way to stop temporarily to consume one kafka source in streaming mode ? Use case: I have to consume 2 topics but in fact one of them is more prioritized. One of this topic is dedicated to ingest data from db (change data capture) and one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... from db). At the moment the last one is performed by one Flink job and we start this one after stop the previous one (CDC) manually I want to merge these 2 modes and automatically stop consumption of the topic dedicated to the CDC mode when a dump is done. How to handle that with Flink in a streaming way ? backpressure ? ... Thx in advance for your insights David