Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread Aljoscha Krettek

On 2021/01/13 12:07, vinay.raic...@t-systems.com wrote:

Ok. Attached is the PPT of what am attempting to achieve w.r.t. time

Hope I am all set to achieve the three bullets mentioned in attached 
slide to create reports with KafkaSource and KafkaBuilder approach.


If you have any additional tips to share please do so after going 
through the slide attached (example for live dashboards use case)


I think that should work with `KafkaSource`, yes. You just need to set 
the correct start timestamps and end timestamps, respectively. I believe 
that's all there is to it, off the top of my head I can't think of any 
additional tips.


Best,
Aljoscha


RE: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread VINAY.RAICHUR
Ok. Attached is the PPT of what am attempting to achieve w.r.t. time 

Hope I am all set to achieve the three bullets mentioned in attached slide to 
create reports with KafkaSource and KafkaBuilder approach. 

If you have any additional tips to share please do so after going through the 
slide attached (example for live dashboards use case) 

Kind regards
Vinay

-Original Message-
From: Aljoscha Krettek  
Sent: 13 January 2021 14:06
To: user@flink.apache.org
Cc: RAICHUR, VINAY 
Subject: Re: Flink to get historical data from kafka between timespan t1 & t2

On 2021/01/13 07:58, vinay.raic...@t-systems.com wrote:
>Not sure about your proposal regarding Point 3:
>*  firstly how is it ensured that the stream is closed? If I understand 
>the doc correctly the stream will be established starting with the 
>latest timestamp (hmm... is it not a standard behaviour?) and will 
>never finish (UNBOUNDED),

On the first question of standard behaviour: the default is to start from the 
group offsets that are available in Kafka. This uses the configured consumer 
group. I think it's better to be explicit, though, and specify sth like 
`EARLIEST` or `LATEST`, etc.

And yes, the stream will start but never stop with this version of the Kafka 
connector. Only when you use the new `KafkaSource` can you also specify an end 
timestamp that will make the Kafka source shut down eventually.

>*  secondly it is still not clear how to get the latest event  at a 
>given time point in the past?

You are referring to getting a single record, correct? I don't think this is 
possible with Flink. All you can do is get a stream from Kafka that is 
potentially bounded by a start timestamp and/or end timestamp.

Best,
Aljoscha


Positioning_Use_Cases_TrackingData_Past_Now.pptx
Description: Positioning_Use_Cases_TrackingData_Past_Now.pptx


Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-13 Thread Aljoscha Krettek

On 2021/01/13 07:58, vinay.raic...@t-systems.com wrote:

Not sure about your proposal regarding Point 3:
*	firstly how is it ensured that the stream is closed? If I understand 
the doc correctly the stream will be established starting with the 
latest timestamp (hmm... is it not a standard behaviour?) and will 
never finish (UNBOUNDED),


On the first question of standard behaviour: the default is to start 
from the group offsets that are available in Kafka. This uses the 
configured consumer group. I think it's better to be explicit, though, 
and specify sth like `EARLIEST` or `LATEST`, etc.


And yes, the stream will start but never stop with this version of the 
Kafka connector. Only when you use the new `KafkaSource` can you also 
specify an end timestamp that will make the Kafka source shut down 
eventually.


*	secondly it is still not clear how to get the latest event  at a 
given time point in the past?


You are referring to getting a single record, correct? I don't think 
this is possible with Flink. All you can do is get a stream from Kafka 
that is potentially bounded by a start timestamp and/or end timestamp.


Best,
Aljoscha


RE: Flink to get historical data from kafka between timespan t1 & t2

2021-01-12 Thread VINAY.RAICHUR
Hi Aljoscha

Not sure about your proposal regarding Point 3:
*   firstly how is it ensured that the stream is closed? If I understand 
the doc correctly the stream will be established starting with the latest 
timestamp (hmm... is it not a standard behaviour?) and will never finish 
(UNBOUNDED),
*   secondly it is still not clear how to get the latest event  at a given 
time point in the past?

Please help me with answers for the above

Thanks & regards
Vinay

-Original Message-
From: Aljoscha Krettek  
Sent: 08 January 2021 19:26
To: user@flink.apache.org
Subject: Re: Flink to get historical data from kafka between timespan t1 & t2

Hi,
for your point 3. you can look at
`FlinkKafkaConsumerBase.setStartFromTimestamp(...)`.

Points 1. and 2. will not work with the well established `FlinkKafkaConsumer`. 
However, it should be possible to do it with the new `KafkaSource` that was 
introduced in Flink 1.12. It might be a bit rough around the edged, though.

With the `KafkaSource` you can specify `OffsetInitializers` for both the 
starting and stopping offset of the source. Take a look at `KafkaSource`, 
`KafkaSourceBuilder`, and `OffsetInitializers` in the code.

I hope this helps.

Best,
Aljoscha

On 2021/01/08 07:51, vinay.raic...@t-systems.com wrote:
>Hi Flink Community Team,
>
>This is a desperate request for your help on below.
>
>I am new to the Flink and trying to use it with Kafka for Event-based data 
>stream processing in my project. I am struggling using Flink to find solutions 
>to my requirements of project below:
>
>
>  1.  Get all Kafka topic records at a given time point 't' (now or in 
>the past). Also how to pull latest-record only* from Kafka using Flink
>  2.  Getting all records from Kafka for a given time interval in the past 
> between t1 & t2 time period.
>  3.  Continuously getting data from Kafka starting at a given time point (now 
> or in the past). The client will actively cancel/close the data streaming. 
> Examples: live dashboards. How to do it using Flink?
>Please provide me sample "Flink code snippet" for pulling data from kafka for 
>above three requirements and oblige. I am stuck for last one month without 
>much progress and your timely help will be a savior for me!
>Thanks & Regards,
>Vinay Raichur
>T-Systems India<https://www.t-systems.com/in/en> | Digital Solutions
>Mail: vinay.raic...@t-systems.com<mailto:vinay.raic...@t-systems.com>
>Mobile: +91 9739488992
>


RE: Flink to get historical data from kafka between timespan t1 & t2

2021-01-12 Thread VINAY.RAICHUR
Hi Team & Aljoscha

Any update on below please?

-Original Message-
From: RAICHUR, VINAY 
Sent: 11 January 2021 19:43
To: 'Aljoscha Krettek' ; user@flink.apache.org
Subject: RE: Flink to get historical data from kafka between timespan t1 & t2
Importance: High

Hi Aljoscha,
Currently in our set-up we are using docker container version of Flink:1.11.2 
along with containerized version of Kafka running on our Kubernetes cluster as 
seen in attached screenshot .png file to this mail of our Vagrant VM.

a) As mentioned by you "KafkaSource" was introduced in Flink 1.12 so, I suppose 
we have to upgrade to this version of Flink. Can you share the link of the 
stable Flink image (containerized version) to be used in our set-up keeping in 
mind we are to useKafkaSource. 

b) Also kindly share details of the corresponding compatible version of "Kafka" 
(containerized version) to be used for making this work as I understand there 
is constraint on version of Kafka required to make it work.

Kind regards,
Vinay

-Original Message-
From: Aljoscha Krettek  
Sent: 11 January 2021 16:06
To: user@flink.apache.org
Cc: RAICHUR, VINAY 
Subject: Re: Flink to get historical data from kafka between timespan t1 & t2

On 2021/01/08 16:55, vinay.raic...@t-systems.com wrote:
>Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, 
>and `OffsetInitializers` that you were referring to in your previous reply, 
>for my reference please to make it more clearer for me.

Ah sorry, but this I was referring to the Flink code. You can start with 
`KafkaSource` [1], which has an example block that shows how to use it in the 
Javadocs.

[1]
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java


Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-12 Thread Aljoscha Krettek

On 2021/01/11 14:12, vinay.raic...@t-systems.com wrote:
a) As mentioned by you "KafkaSource" was introduced in Flink 1.12 so, I 
suppose we have to upgrade to this version of Flink. Can you share the 
link of the stable Flink image (containerized version) to be used in 
our set-up keeping in mind we are to use KafkaSource.


Unfortunately, there is a problem with publishing those images. You can 
check out [1] to follow progress. In the meantime you can try and build 
the image yourself or use the one that Robert pushed to his private 
account.


b) Also kindly share details of the corresponding compatible version of 
"Kafka" (containerized version) to be used for making this work as I 
understand there is constraint on version of Kafka required to make it 
work.


I believe Kafka is backwards compatible since at least version 1.0, so 
any recent enough version should work.


Best,
Aljoscha


Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-11 Thread Aljoscha Krettek

On 2021/01/08 16:55, vinay.raic...@t-systems.com wrote:

Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, 
and `OffsetInitializers` that you were referring to in your previous reply, for 
my reference please to make it more clearer for me.


Ah sorry, but this I was referring to the Flink code. You can start with 
`KafkaSource` [1], which has an example block that shows how to use it 
in the Javadocs.


[1] 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java


RE: Flink to get historical data from kafka between timespan t1 & t2

2021-01-08 Thread VINAY.RAICHUR
Thanks Aljoscha for your prompt response. It means a lot to me 

Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, 
and `OffsetInitializers` that you were referring to in your previous reply, for 
my reference please to make it more clearer for me.

Kind regards,
Vinay

-Original Message-
From: Aljoscha Krettek  
Sent: 08 January 2021 19:26
To: user@flink.apache.org
Subject: Re: Flink to get historical data from kafka between timespan t1 & t2

Hi,

for your point 3. you can look at
`FlinkKafkaConsumerBase.setStartFromTimestamp(...)`.

Points 1. and 2. will not work with the well established `FlinkKafkaConsumer`. 
However, it should be possible to do it with the new `KafkaSource` that was 
introduced in Flink 1.12. It might be a bit rough around the edged, though.

With the `KafkaSource` you can specify `OffsetInitializers` for both the 
starting and stopping offset of the source. Take a look at `KafkaSource`, 
`KafkaSourceBuilder`, and `OffsetInitializers` in the code.

I hope this helps.

Best,
Aljoscha

On 2021/01/08 07:51, vinay.raic...@t-systems.com wrote:
>Hi Flink Community Team,
>
>This is a desperate request for your help on below.
>
>I am new to the Flink and trying to use it with Kafka for Event-based data 
>stream processing in my project. I am struggling using Flink to find solutions 
>to my requirements of project below:
>
>
>  1.  Get all Kafka topic records at a given time point 't' (now or in 
>the past). Also how to pull latest-record only* from Kafka using Flink
>  2.  Getting all records from Kafka for a given time interval in the past 
> between t1 & t2 time period.
>  3.  Continuously getting data from Kafka starting at a given time point (now 
> or in the past). The client will actively cancel/close the data streaming. 
> Examples: live dashboards. How to do it using Flink?
>Please provide me sample "Flink code snippet" for pulling data from kafka for 
>above three requirements and oblige. I am stuck for last one month without 
>much progress and your timely help will be a savior for me!
>Thanks & Regards,
>Vinay Raichur
>T-Systems India<https://www.t-systems.com/in/en> | Digital Solutions
>Mail: vinay.raic...@t-systems.com<mailto:vinay.raic...@t-systems.com>
>Mobile: +91 9739488992
>


Re: Flink to get historical data from kafka between timespan t1 & t2

2021-01-08 Thread Aljoscha Krettek

Hi,

for your point 3. you can look at 
`FlinkKafkaConsumerBase.setStartFromTimestamp(...)`.


Points 1. and 2. will not work with the well established 
`FlinkKafkaConsumer`. However, it should be possible to do it with the 
new `KafkaSource` that was introduced in Flink 1.12. It might be a bit 
rough around the edged, though.


With the `KafkaSource` you can specify `OffsetInitializers` for both the 
starting and stopping offset of the source. Take a look at 
`KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` in the 
code.


I hope this helps.

Best,
Aljoscha

On 2021/01/08 07:51, vinay.raic...@t-systems.com wrote:

Hi Flink Community Team,

This is a desperate request for your help on below.

I am new to the Flink and trying to use it with Kafka for Event-based data 
stream processing in my project. I am struggling using Flink to find solutions 
to my requirements of project below:


 1.  Get all Kafka topic records at a given time point 't' (now or in the 
past). Also how to pull latest-record only* from Kafka using Flink
 2.  Getting all records from Kafka for a given time interval in the past between 
t1 & t2 time period.
 3.  Continuously getting data from Kafka starting at a given time point (now 
or in the past). The client will actively cancel/close the data streaming. 
Examples: live dashboards. How to do it using Flink?
Please provide me sample "Flink code snippet" for pulling data from kafka for 
above three requirements and oblige. I am stuck for last one month without much progress 
and your timely help will be a savior for me!
Thanks & Regards,
Vinay Raichur
T-Systems India | Digital Solutions
Mail: vinay.raic...@t-systems.com
Mobile: +91 9739488992



Flink to get historical data from kafka between timespan t1 & t2

2021-01-07 Thread VINAY.RAICHUR
Hi Flink Community Team,

This is a desperate request for your help on below.

I am new to the Flink and trying to use it with Kafka for Event-based data 
stream processing in my project. I am struggling using Flink to find solutions 
to my requirements of project below:


  1.  Get all Kafka topic records at a given time point 't' (now or in the 
past). Also how to pull latest-record only* from Kafka using Flink
  2.  Getting all records from Kafka for a given time interval in the past 
between t1 & t2 time period.
  3.  Continuously getting data from Kafka starting at a given time point (now 
or in the past). The client will actively cancel/close the data streaming. 
Examples: live dashboards. How to do it using Flink?
Please provide me sample "Flink code snippet" for pulling data from kafka for 
above three requirements and oblige. I am stuck for last one month without much 
progress and your timely help will be a savior for me!
Thanks & Regards,
Vinay Raichur
T-Systems India | Digital Solutions
Mail: vinay.raic...@t-systems.com
Mobile: +91 9739488992