Re: kafka: how to stop consumption temporarily

2020-01-08 Thread David Morin
Awesome !
I gonna implement it.
Thanks a lot Arvid.

Le mer. 8 janv. 2020 à 12:00, Arvid Heise  a écrit :

> I'd second Chesnay's suggestion to use a custom source. It would be a
> piece of cake with FLIP-27 [1], but we are not there yet unfortunately.
> It's probably in Flink 1.11 (mid year) if you can wait.
>
> The current way would be a source that wraps the two KafkaConsumer and
> blocks the normal consumer from outputting elements. Here is a quick and
> dirty solution that I threw together:
> https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd .
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>
> On Mon, Jan 6, 2020 at 1:16 PM David Morin 
> wrote:
>
>> My naive solution can't work because a dump can be quite long.
>> So, yes I have to find a way to stop the consumption from the topic used
>> for streaming mode when a dump is done :(
>> Terry, I try to implement something based on your reply and based on this
>> thread
>> https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a
>> Any suggestions are welcomed
>> thx.
>>
>> David
>>
>> On 2020/01/06 09:35:37, David Morin  wrote:
>> > Hi,
>> >
>> > Thanks for your replies.
>> > Yes Terry. You are right. I can try to create a custom source.
>> > But perhaps, according to my use case, I figured out I can use a
>> technical field in my data. This is a timestamp and I think I just have to
>> ignore late events with watermarks or later in the pipeline according to
>> metadata stored in the Flink state. I test it now...
>> > Thx
>> >
>> > David
>> >
>> > On 2020/01/03 15:44:08, Chesnay Schepler  wrote:
>> > > Are you asking how to detect from within the job whether the dump is
>> > > complete, or how to combine these 2 jobs?
>> > >
>> > > If you had a way to notice whether the dump is complete, then I would
>> > > suggest to create a custom source that wraps 2 kafka sources, and
>> switch
>> > > between them at will based on your conditions.
>> > >
>> > >
>> > > On 03/01/2020 03:53, Terry Wang wrote:
>> > > > Hi,
>> > > >
>> > > > I’d like to share my opinion here. It seems that you need adjust
>> the Kafka consumer to have communication each other. When your begin the
>> dump process, you need to notify another CDC-topic consumer to wait idle.
>> > > >
>> > > >
>> > > > Best,
>> > > > Terry Wang
>> > > >
>> > > >
>> > > >
>> > > >> 2020年1月2日 16:49,David Morin  写道:
>> > > >>
>> > > >> Hi,
>> > > >>
>> > > >> Is there a way to stop temporarily to consume one kafka source in
>> streaming mode ?
>> > > >> Use case: I have to consume 2 topics but in fact one of them is
>> more prioritized.
>> > > >> One of this topic is dedicated to ingest data from db (change data
>> capture) and one of them is dedicated to make a synchronization (a dump
>> i.e. a SELECT ... from db). At the moment the last one is performed by one
>> Flink job and we start this one after stop the previous one (CDC) manually
>> > > >> I want to merge these 2 modes and automatically stop consumption
>> of the topic dedicated to the CDC mode when a dump is done.
>> > > >> How to handle that with Flink in a streaming way ? backpressure ?
>> ...
>> > > >> Thx in advance for your insights
>> > > >>
>> > > >> David
>> > > >
>> > >
>> > >
>> >
>>
>


Re: kafka: how to stop consumption temporarily

2020-01-08 Thread Arvid Heise
I'd second Chesnay's suggestion to use a custom source. It would be a piece
of cake with FLIP-27 [1], but we are not there yet unfortunately. It's
probably in Flink 1.11 (mid year) if you can wait.

The current way would be a source that wraps the two KafkaConsumer and
blocks the normal consumer from outputting elements. Here is a quick and
dirty solution that I threw together:
https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd .

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface

On Mon, Jan 6, 2020 at 1:16 PM David Morin 
wrote:

> My naive solution can't work because a dump can be quite long.
> So, yes I have to find a way to stop the consumption from the topic used
> for streaming mode when a dump is done :(
> Terry, I try to implement something based on your reply and based on this
> thread
> https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a
> Any suggestions are welcomed
> thx.
>
> David
>
> On 2020/01/06 09:35:37, David Morin  wrote:
> > Hi,
> >
> > Thanks for your replies.
> > Yes Terry. You are right. I can try to create a custom source.
> > But perhaps, according to my use case, I figured out I can use a
> technical field in my data. This is a timestamp and I think I just have to
> ignore late events with watermarks or later in the pipeline according to
> metadata stored in the Flink state. I test it now...
> > Thx
> >
> > David
> >
> > On 2020/01/03 15:44:08, Chesnay Schepler  wrote:
> > > Are you asking how to detect from within the job whether the dump is
> > > complete, or how to combine these 2 jobs?
> > >
> > > If you had a way to notice whether the dump is complete, then I would
> > > suggest to create a custom source that wraps 2 kafka sources, and
> switch
> > > between them at will based on your conditions.
> > >
> > >
> > > On 03/01/2020 03:53, Terry Wang wrote:
> > > > Hi,
> > > >
> > > > I’d like to share my opinion here. It seems that you need adjust the
> Kafka consumer to have communication each other. When your begin the dump
> process, you need to notify another CDC-topic consumer to wait idle.
> > > >
> > > >
> > > > Best,
> > > > Terry Wang
> > > >
> > > >
> > > >
> > > >> 2020年1月2日 16:49,David Morin  写道:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Is there a way to stop temporarily to consume one kafka source in
> streaming mode ?
> > > >> Use case: I have to consume 2 topics but in fact one of them is
> more prioritized.
> > > >> One of this topic is dedicated to ingest data from db (change data
> capture) and one of them is dedicated to make a synchronization (a dump
> i.e. a SELECT ... from db). At the moment the last one is performed by one
> Flink job and we start this one after stop the previous one (CDC) manually
> > > >> I want to merge these 2 modes and automatically stop consumption of
> the topic dedicated to the CDC mode when a dump is done.
> > > >> How to handle that with Flink in a streaming way ? backpressure ?
> ...
> > > >> Thx in advance for your insights
> > > >>
> > > >> David
> > > >
> > >
> > >
> >
>


Re: kafka: how to stop consumption temporarily

2020-01-06 Thread David Morin
My naive solution can't work because a dump can be quite long.
So, yes I have to find a way to stop the consumption from the topic used for 
streaming mode when a dump is done :(
Terry, I try to implement something based on your reply and based on this 
thread 
https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a
 
Any suggestions are welcomed
thx.

David

On 2020/01/06 09:35:37, David Morin  wrote: 
> Hi,
> 
> Thanks for your replies.
> Yes Terry. You are right. I can try to create a custom source. 
> But perhaps, according to my use case, I figured out I can use a technical 
> field in my data. This is a timestamp and I think I just have to ignore late 
> events with watermarks or later in the pipeline according to metadata stored 
> in the Flink state. I test it now...
> Thx
> 
> David
> 
> On 2020/01/03 15:44:08, Chesnay Schepler  wrote: 
> > Are you asking how to detect from within the job whether the dump is 
> > complete, or how to combine these 2 jobs?
> > 
> > If you had a way to notice whether the dump is complete, then I would 
> > suggest to create a custom source that wraps 2 kafka sources, and switch 
> > between them at will based on your conditions.
> > 
> > 
> > On 03/01/2020 03:53, Terry Wang wrote:
> > > Hi,
> > >
> > > I’d like to share my opinion here. It seems that you need adjust the 
> > > Kafka consumer to have communication each other. When your begin the dump 
> > > process, you need to notify another CDC-topic consumer to wait idle.
> > >
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > >> 2020年1月2日 16:49,David Morin  写道:
> > >>
> > >> Hi,
> > >>
> > >> Is there a way to stop temporarily to consume one kafka source in 
> > >> streaming mode ?
> > >> Use case: I have to consume 2 topics but in fact one of them is more 
> > >> prioritized.
> > >> One of this topic is dedicated to ingest data from db (change data 
> > >> capture) and one of them is dedicated to make a synchronization (a dump 
> > >> i.e. a SELECT ... from db). At the moment the last one is performed by 
> > >> one Flink job and we start this one after stop the previous one (CDC) 
> > >> manually
> > >> I want to merge these 2 modes and automatically stop consumption of the 
> > >> topic dedicated to the CDC mode when a dump is done.
> > >> How to handle that with Flink in a streaming way ? backpressure ? ...
> > >> Thx in advance for your insights
> > >>
> > >> David
> > >
> > 
> > 
> 


Re: kafka: how to stop consumption temporarily

2020-01-06 Thread David Morin
Hi,

Thanks for your replies.
Yes Terry. You are right. I can try to create a custom source. 
But perhaps, according to my use case, I figured out I can use a technical 
field in my data. This is a timestamp and I think I just have to ignore late 
events with watermarks or later in the pipeline according to metadata stored in 
the Flink state. I test it now...
Thx

David

On 2020/01/03 15:44:08, Chesnay Schepler  wrote: 
> Are you asking how to detect from within the job whether the dump is 
> complete, or how to combine these 2 jobs?
> 
> If you had a way to notice whether the dump is complete, then I would 
> suggest to create a custom source that wraps 2 kafka sources, and switch 
> between them at will based on your conditions.
> 
> 
> On 03/01/2020 03:53, Terry Wang wrote:
> > Hi,
> >
> > I’d like to share my opinion here. It seems that you need adjust the Kafka 
> > consumer to have communication each other. When your begin the dump 
> > process, you need to notify another CDC-topic consumer to wait idle.
> >
> >
> > Best,
> > Terry Wang
> >
> >
> >
> >> 2020年1月2日 16:49,David Morin  写道:
> >>
> >> Hi,
> >>
> >> Is there a way to stop temporarily to consume one kafka source in 
> >> streaming mode ?
> >> Use case: I have to consume 2 topics but in fact one of them is more 
> >> prioritized.
> >> One of this topic is dedicated to ingest data from db (change data 
> >> capture) and one of them is dedicated to make a synchronization (a dump 
> >> i.e. a SELECT ... from db). At the moment the last one is performed by one 
> >> Flink job and we start this one after stop the previous one (CDC) manually
> >> I want to merge these 2 modes and automatically stop consumption of the 
> >> topic dedicated to the CDC mode when a dump is done.
> >> How to handle that with Flink in a streaming way ? backpressure ? ...
> >> Thx in advance for your insights
> >>
> >> David
> >
> 
> 


Re: kafka: how to stop consumption temporarily

2020-01-03 Thread Chesnay Schepler
Are you asking how to detect from within the job whether the dump is 
complete, or how to combine these 2 jobs?


If you had a way to notice whether the dump is complete, then I would 
suggest to create a custom source that wraps 2 kafka sources, and switch 
between them at will based on your conditions.



On 03/01/2020 03:53, Terry Wang wrote:

Hi,

I’d like to share my opinion here. It seems that you need adjust the Kafka 
consumer to have communication each other. When your begin the dump process, 
you need to notify another CDC-topic consumer to wait idle.


Best,
Terry Wang




2020年1月2日 16:49,David Morin  写道:

Hi,

Is there a way to stop temporarily to consume one kafka source in streaming 
mode ?
Use case: I have to consume 2 topics but in fact one of them is more 
prioritized.
One of this topic is dedicated to ingest data from db (change data capture) and 
one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... 
from db). At the moment the last one is performed by one Flink job and we start 
this one after stop the previous one (CDC) manually
I want to merge these 2 modes and automatically stop consumption of the topic 
dedicated to the CDC mode when a dump is done.
How to handle that with Flink in a streaming way ? backpressure ? ...
Thx in advance for your insights

David






Re: kafka: how to stop consumption temporarily

2020-01-02 Thread Terry Wang
Hi,

I’d like to share my opinion here. It seems that you need adjust the Kafka 
consumer to have communication each other. When your begin the dump process, 
you need to notify another CDC-topic consumer to wait idle.


Best,
Terry Wang



> 2020年1月2日 16:49,David Morin  写道:
> 
> Hi,
> 
> Is there a way to stop temporarily to consume one kafka source in streaming 
> mode ?
> Use case: I have to consume 2 topics but in fact one of them is more 
> prioritized.
> One of this topic is dedicated to ingest data from db (change data capture) 
> and one of them is dedicated to make a synchronization (a dump i.e. a SELECT 
> ... from db). At the moment the last one is performed by one Flink job and we 
> start this one after stop the previous one (CDC) manually
> I want to merge these 2 modes and automatically stop consumption of the topic 
> dedicated to the CDC mode when a dump is done.
> How to handle that with Flink in a streaming way ? backpressure ? ...
> Thx in advance for your insights
> 
> David



kafka: how to stop consumption temporarily

2020-01-02 Thread David Morin
Hi,

Is there a way to stop temporarily to consume one kafka source in streaming 
mode ?
Use case: I have to consume 2 topics but in fact one of them is more 
prioritized.
One of this topic is dedicated to ingest data from db (change data capture) and 
one of them is dedicated to make a synchronization (a dump i.e. a SELECT ... 
from db). At the moment the last one is performed by one Flink job and we start 
this one after stop the previous one (CDC) manually
I want to merge these 2 modes and automatically stop consumption of the topic 
dedicated to the CDC mode when a dump is done.
How to handle that with Flink in a streaming way ? backpressure ? ...
Thx in advance for your insights

David