Best way to link static data to event data?

2019-09-27 Thread John Smith
Using 1.8

I have a list of phone area codes, cities and their geo location in CSV
file. And my events from Kafka contain phone numbers.

I want to parse the phone number get it's area code and then associate the
phone number to a city, geo location and as well count how many numbers are
in that city/geo location.


Re: Best way to link static data to event data?

2019-09-27 Thread Oytun Tez
Hi,

You should look broadcast state pattern in Flink docs.

---
Oytun Tez

*M O T A W O R D*
The World's Fastest Human Translation Platform.
oy...@motaword.com — www.motaword.com


On Fri, Sep 27, 2019 at 2:42 PM John Smith  wrote:

> Using 1.8
>
> I have a list of phone area codes, cities and their geo location in CSV
> file. And my events from Kafka contain phone numbers.
>
> I want to parse the phone number get it's area code and then associate the
> phone number to a city, geo location and as well count how many numbers are
> in that city/geo location.
>


Re: Best way to link static data to event data?

2019-09-27 Thread John Smith
I don't think I need state for this...

I need to load a CSV. I'm guessing as a table and then filter my events
parse the number, transform the event into geolocation data and sink that
downstream data source.

So I'm guessing i need a CSV source and my Kafka source and somehow join
those transform the event...

On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:

> Hi,
>
> You should look broadcast state pattern in Flink docs.
>
> ---
> Oytun Tez
>
> *M O T A W O R D*
> The World's Fastest Human Translation Platform.
> oy...@motaword.com — www.motaword.com
>
>
> On Fri, Sep 27, 2019 at 2:42 PM John Smith  wrote:
>
>> Using 1.8
>>
>> I have a list of phone area codes, cities and their geo location in CSV
>> file. And my events from Kafka contain phone numbers.
>>
>> I want to parse the phone number get it's area code and then associate
>> the phone number to a city, geo location and as well count how many numbers
>> are in that city/geo location.
>>
>


Re: Best way to link static data to event data?

2019-09-27 Thread Sameer W
Connected Streams is one option. But may be an overkill in your scenario if
your CSV does not refresh. If your CSV is small enough (number of records
wise), you could parse it and load it into an object (serializable) and
pass it to the constructor of the operator where you will be streaming the
data.

If the CSV can be made available via a shared network folder (or S3 in case
of AWS) you could also read it in the open function (if you use Rich
versions of the operator).

The real problem I guess is how frequently does the CSV update. If you want
the updates to propagate in near real time (or on schedule) the option 1  (
parse in driver and send it via constructor does not work). Also in the
second option you need to be responsible for refreshing the file read from
the shared folder.

In that case use Connected Streams where the stream reading in the file
(the other stream reads the events) periodically re-reads the file and
sends it down the stream. The refresh interval is your tolerance of stale
data in the CSV.

On Fri, Sep 27, 2019 at 3:49 PM John Smith  wrote:

> I don't think I need state for this...
>
> I need to load a CSV. I'm guessing as a table and then filter my events
> parse the number, transform the event into geolocation data and sink that
> downstream data source.
>
> So I'm guessing i need a CSV source and my Kafka source and somehow join
> those transform the event...
>
> On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
>
>> Hi,
>>
>> You should look broadcast state pattern in Flink docs.
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oy...@motaword.com — www.motaword.com
>>
>>
>> On Fri, Sep 27, 2019 at 2:42 PM John Smith 
>> wrote:
>>
>>> Using 1.8
>>>
>>> I have a list of phone area codes, cities and their geo location in CSV
>>> file. And my events from Kafka contain phone numbers.
>>>
>>> I want to parse the phone number get it's area code and then associate
>>> the phone number to a city, geo location and as well count how many numbers
>>> are in that city/geo location.
>>>
>>


Re: Best way to link static data to event data?

2019-09-27 Thread John Smith
It's a fairly small static file that may update once in a blue moon lol But
I'm hopping to use existing functions. Why can't I just use CSV to table
source?

Why should I have to now either write my own CSV parser or look for 3rd
party, then what put in a Java Map and lookup that map? I'm finding Flink
to be a bit of death by 1000 paper cuts lol

if i put the CSV in a table I can then use it to join across it with the
event no?

On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:

> Connected Streams is one option. But may be an overkill in your scenario
> if your CSV does not refresh. If your CSV is small enough (number of
> records wise), you could parse it and load it into an object (serializable)
> and pass it to the constructor of the operator where you will be streaming
> the data.
>
> If the CSV can be made available via a shared network folder (or S3 in
> case of AWS) you could also read it in the open function (if you use Rich
> versions of the operator).
>
> The real problem I guess is how frequently does the CSV update. If you
> want the updates to propagate in near real time (or on schedule) the option
> 1  ( parse in driver and send it via constructor does not work). Also in
> the second option you need to be responsible for refreshing the file read
> from the shared folder.
>
> In that case use Connected Streams where the stream reading in the file
> (the other stream reads the events) periodically re-reads the file and
> sends it down the stream. The refresh interval is your tolerance of stale
> data in the CSV.
>
> On Fri, Sep 27, 2019 at 3:49 PM John Smith  wrote:
>
>> I don't think I need state for this...
>>
>> I need to load a CSV. I'm guessing as a table and then filter my events
>> parse the number, transform the event into geolocation data and sink that
>> downstream data source.
>>
>> So I'm guessing i need a CSV source and my Kafka source and somehow join
>> those transform the event...
>>
>> On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
>>
>>> Hi,
>>>
>>> You should look broadcast state pattern in Flink docs.
>>>
>>> ---
>>> Oytun Tez
>>>
>>> *M O T A W O R D*
>>> The World's Fastest Human Translation Platform.
>>> oy...@motaword.com — www.motaword.com
>>>
>>>
>>> On Fri, Sep 27, 2019 at 2:42 PM John Smith 
>>> wrote:
>>>
 Using 1.8

 I have a list of phone area codes, cities and their geo location in CSV
 file. And my events from Kafka contain phone numbers.

 I want to parse the phone number get it's area code and then associate
 the phone number to a city, geo location and as well count how many numbers
 are in that city/geo location.

>>>


Re: Best way to link static data to event data?

2019-09-27 Thread Sameer Wadkar
The main consideration in these type of scenarios is not the type of source 
function you use. The key point is how does the event operator get the slow 
moving master data and cache it. And then recover it if it fails and restarts 
again. 

It does not matter that the csv file does not change often. It is possible that 
the event operator may fail and restart. The csv data needs to made available 
to it again. 

In that scenario the initial suggestion I made to pass the csv data in the 
constructor is not adequate by itself. You need to store it in the operator 
state which allows it to recover it when it restarts  on failure.

As long as the above takes place you have resiliency and you can use any 
suitable method or source. I have not used Table source as much but connected 
streams and operator state has worked out for me in similar scenarios. 

Sameer

Sent from my iPhone

> On Sep 27, 2019, at 4:38 PM, John Smith  wrote:
> 
> It's a fairly small static file that may update once in a blue moon lol But 
> I'm hopping to use existing functions. Why can't I just use CSV to table 
> source?
> 
> Why should I have to now either write my own CSV parser or look for 3rd 
> party, then what put in a Java Map and lookup that map? I'm finding Flink to 
> be a bit of death by 1000 paper cuts lol
> 
> if i put the CSV in a table I can then use it to join across it with the 
> event no?
> 
>> On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:
>> Connected Streams is one option. But may be an overkill in your scenario if 
>> your CSV does not refresh. If your CSV is small enough (number of records 
>> wise), you could parse it and load it into an object (serializable) and pass 
>> it to the constructor of the operator where you will be streaming the data. 
>> 
>> If the CSV can be made available via a shared network folder (or S3 in case 
>> of AWS) you could also read it in the open function (if you use Rich 
>> versions of the operator).
>> 
>> The real problem I guess is how frequently does the CSV update. If you want 
>> the updates to propagate in near real time (or on schedule) the option 1  ( 
>> parse in driver and send it via constructor does not work). Also in the 
>> second option you need to be responsible for refreshing the file read from 
>> the shared folder.
>> 
>> In that case use Connected Streams where the stream reading in the file (the 
>> other stream reads the events) periodically re-reads the file and sends it 
>> down the stream. The refresh interval is your tolerance of stale data in the 
>> CSV.
>> 
>>> On Fri, Sep 27, 2019 at 3:49 PM John Smith  wrote:
>>> I don't think I need state for this...
>>> 
>>> I need to load a CSV. I'm guessing as a table and then filter my events 
>>> parse the number, transform the event into geolocation data and sink that 
>>> downstream data source.
>>> 
>>> So I'm guessing i need a CSV source and my Kafka source and somehow join 
>>> those transform the event...
>>> 
 On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
 Hi,
 
 You should look broadcast state pattern in Flink docs.
 
 ---
 Oytun Tez
 
 M O T A W O R D
 The World's Fastest Human Translation Platform.
 oy...@motaword.com — www.motaword.com
 
 
> On Fri, Sep 27, 2019 at 2:42 PM John Smith  wrote:
> Using 1.8
> 
> I have a list of phone area codes, cities and their geo location in CSV 
> file. And my events from Kafka contain phone numbers.
> 
> I want to parse the phone number get it's area code and then associate 
> the phone number to a city, geo location and as well count how many 
> numbers are in that city/geo location.


Re: Best way to link static data to event data?

2019-09-30 Thread Gaël Renoux
Hi John,

I've had a similar requirement, and I've resorted to simply use a static
cache (I'm coding in Scala, so that's a lazy value on a singleton object -
in Java that would be a static value on some utility class, with a
synchronized lazy-loading getter). The value is reloaded after some
duration, which adds a small latency at regular intervals. Keep in mind
that one instance of that value will be loaded on each task manager
(provided that at least one task running on that task manager calls the
getter).

If you're OK with restarting the job when your data changes, it would be
better to load it on start (no need to synchronize stuff). Just load it
inside your job initialization code (it will be executed within the job
manager) and pass that data as a parameter to your operator's constructor.
The data format must be serializable.

Gaël


On Sat, Sep 28, 2019 at 2:18 AM Sameer Wadkar  wrote:

> The main consideration in these type of scenarios is not the type of
> source function you use. The key point is how does the event operator get
> the slow moving master data and cache it. And then recover it if it fails
> and restarts again.
>
> It does not matter that the csv file does not change often. It is possible
> that the event operator may fail and restart. The csv data needs to made
> available to it again.
>
> In that scenario the initial suggestion I made to pass the csv data in the
> constructor is not adequate by itself. You need to store it in the operator
> state which allows it to recover it when it restarts  on failure.
>
> As long as the above takes place you have resiliency and you can use any
> suitable method or source. I have not used Table source as much but
> connected streams and operator state has worked out for me in similar
> scenarios.
>
> Sameer
>
> Sent from my iPhone
>
> On Sep 27, 2019, at 4:38 PM, John Smith  wrote:
>
> It's a fairly small static file that may update once in a blue moon lol
> But I'm hopping to use existing functions. Why can't I just use CSV to
> table source?
>
> Why should I have to now either write my own CSV parser or look for 3rd
> party, then what put in a Java Map and lookup that map? I'm finding Flink
> to be a bit of death by 1000 paper cuts lol
>
> if i put the CSV in a table I can then use it to join across it with the
> event no?
>
> On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:
>
>> Connected Streams is one option. But may be an overkill in your scenario
>> if your CSV does not refresh. If your CSV is small enough (number of
>> records wise), you could parse it and load it into an object (serializable)
>> and pass it to the constructor of the operator where you will be streaming
>> the data.
>>
>> If the CSV can be made available via a shared network folder (or S3 in
>> case of AWS) you could also read it in the open function (if you use Rich
>> versions of the operator).
>>
>> The real problem I guess is how frequently does the CSV update. If you
>> want the updates to propagate in near real time (or on schedule) the option
>> 1  ( parse in driver and send it via constructor does not work). Also in
>> the second option you need to be responsible for refreshing the file read
>> from the shared folder.
>>
>> In that case use Connected Streams where the stream reading in the file
>> (the other stream reads the events) periodically re-reads the file and
>> sends it down the stream. The refresh interval is your tolerance of stale
>> data in the CSV.
>>
>> On Fri, Sep 27, 2019 at 3:49 PM John Smith 
>> wrote:
>>
>>> I don't think I need state for this...
>>>
>>> I need to load a CSV. I'm guessing as a table and then filter my events
>>> parse the number, transform the event into geolocation data and sink that
>>> downstream data source.
>>>
>>> So I'm guessing i need a CSV source and my Kafka source and somehow join
>>> those transform the event...
>>>
>>> On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
>>>
 Hi,

 You should look broadcast state pattern in Flink docs.

 ---
 Oytun Tez

 *M O T A W O R D*
 The World's Fastest Human Translation Platform.
 oy...@motaword.com — www.motaword.com


 On Fri, Sep 27, 2019 at 2:42 PM John Smith 
 wrote:

> Using 1.8
>
> I have a list of phone area codes, cities and their geo location in
> CSV file. And my events from Kafka contain phone numbers.
>
> I want to parse the phone number get it's area code and then associate
> the phone number to a city, geo location and as well count how many 
> numbers
> are in that city/geo location.
>


-- 
Gaël Renoux
Senior R&D Engineer, DataDome
M +33 6 76 89 16 52  <+33+6+76+89+16+52>
E gael.ren...@datadome.co  
W www.datadome.co





Re: Best way to link static data to event data?

2019-09-30 Thread John Smith
Ok thanks. It's basically telephone area codes, they barely ever change.

On Mon, 30 Sep 2019 at 06:03, Gaël Renoux  wrote:

> Hi John,
>
> I've had a similar requirement, and I've resorted to simply use a static
> cache (I'm coding in Scala, so that's a lazy value on a singleton object -
> in Java that would be a static value on some utility class, with a
> synchronized lazy-loading getter). The value is reloaded after some
> duration, which adds a small latency at regular intervals. Keep in mind
> that one instance of that value will be loaded on each task manager
> (provided that at least one task running on that task manager calls the
> getter).
>
> If you're OK with restarting the job when your data changes, it would be
> better to load it on start (no need to synchronize stuff). Just load it
> inside your job initialization code (it will be executed within the job
> manager) and pass that data as a parameter to your operator's constructor.
> The data format must be serializable.
>
> Gaël
>
>
> On Sat, Sep 28, 2019 at 2:18 AM Sameer Wadkar  wrote:
>
>> The main consideration in these type of scenarios is not the type of
>> source function you use. The key point is how does the event operator get
>> the slow moving master data and cache it. And then recover it if it fails
>> and restarts again.
>>
>> It does not matter that the csv file does not change often. It is
>> possible that the event operator may fail and restart. The csv data needs
>> to made available to it again.
>>
>> In that scenario the initial suggestion I made to pass the csv data in
>> the constructor is not adequate by itself. You need to store it in the
>> operator state which allows it to recover it when it restarts  on failure.
>>
>> As long as the above takes place you have resiliency and you can use any
>> suitable method or source. I have not used Table source as much but
>> connected streams and operator state has worked out for me in similar
>> scenarios.
>>
>> Sameer
>>
>> Sent from my iPhone
>>
>> On Sep 27, 2019, at 4:38 PM, John Smith  wrote:
>>
>> It's a fairly small static file that may update once in a blue moon lol
>> But I'm hopping to use existing functions. Why can't I just use CSV to
>> table source?
>>
>> Why should I have to now either write my own CSV parser or look for 3rd
>> party, then what put in a Java Map and lookup that map? I'm finding Flink
>> to be a bit of death by 1000 paper cuts lol
>>
>> if i put the CSV in a table I can then use it to join across it with the
>> event no?
>>
>> On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:
>>
>>> Connected Streams is one option. But may be an overkill in your scenario
>>> if your CSV does not refresh. If your CSV is small enough (number of
>>> records wise), you could parse it and load it into an object (serializable)
>>> and pass it to the constructor of the operator where you will be streaming
>>> the data.
>>>
>>> If the CSV can be made available via a shared network folder (or S3 in
>>> case of AWS) you could also read it in the open function (if you use Rich
>>> versions of the operator).
>>>
>>> The real problem I guess is how frequently does the CSV update. If you
>>> want the updates to propagate in near real time (or on schedule) the option
>>> 1  ( parse in driver and send it via constructor does not work). Also in
>>> the second option you need to be responsible for refreshing the file read
>>> from the shared folder.
>>>
>>> In that case use Connected Streams where the stream reading in the file
>>> (the other stream reads the events) periodically re-reads the file and
>>> sends it down the stream. The refresh interval is your tolerance of stale
>>> data in the CSV.
>>>
>>> On Fri, Sep 27, 2019 at 3:49 PM John Smith 
>>> wrote:
>>>
 I don't think I need state for this...

 I need to load a CSV. I'm guessing as a table and then filter my events
 parse the number, transform the event into geolocation data and sink that
 downstream data source.

 So I'm guessing i need a CSV source and my Kafka source and somehow
 join those transform the event...

 On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:

> Hi,
>
> You should look broadcast state pattern in Flink docs.
>
> ---
> Oytun Tez
>
> *M O T A W O R D*
> The World's Fastest Human Translation Platform.
> oy...@motaword.com — www.motaword.com
>
>
> On Fri, Sep 27, 2019 at 2:42 PM John Smith 
> wrote:
>
>> Using 1.8
>>
>> I have a list of phone area codes, cities and their geo location in
>> CSV file. And my events from Kafka contain phone numbers.
>>
>> I want to parse the phone number get it's area code and then
>> associate the phone number to a city, geo location and as well count how
>> many numbers are in that city/geo location.
>>
>
>
> --
> Gaël Renoux
> Senior R&D Engineer, DataDome
> M +33 6 76 89 16 52  <+33+6+76+89+16+52>
> E gael.ren...@datadome.co  
> W

Re: Best way to link static data to event data?

2019-09-30 Thread John Smith
Hi so here is what I have done...

1- I load my CSV using CSV table source.
2- 1 setup Kafka stream to read my incoming events.
3- Map my events to a POJO
4- Join the 2 tables
5- Push the joined result to Elastic search.

This works absolutely fine. So whats the difference between this and the
proposed solutions above?


On Mon, 30 Sep 2019 at 13:35, John Smith  wrote:

> Ok thanks. It's basically telephone area codes, they barely ever change.
>
> On Mon, 30 Sep 2019 at 06:03, Gaël Renoux  wrote:
>
>> Hi John,
>>
>> I've had a similar requirement, and I've resorted to simply use a static
>> cache (I'm coding in Scala, so that's a lazy value on a singleton object -
>> in Java that would be a static value on some utility class, with a
>> synchronized lazy-loading getter). The value is reloaded after some
>> duration, which adds a small latency at regular intervals. Keep in mind
>> that one instance of that value will be loaded on each task manager
>> (provided that at least one task running on that task manager calls the
>> getter).
>>
>> If you're OK with restarting the job when your data changes, it would be
>> better to load it on start (no need to synchronize stuff). Just load it
>> inside your job initialization code (it will be executed within the job
>> manager) and pass that data as a parameter to your operator's constructor.
>> The data format must be serializable.
>>
>> Gaël
>>
>>
>> On Sat, Sep 28, 2019 at 2:18 AM Sameer Wadkar 
>> wrote:
>>
>>> The main consideration in these type of scenarios is not the type of
>>> source function you use. The key point is how does the event operator get
>>> the slow moving master data and cache it. And then recover it if it fails
>>> and restarts again.
>>>
>>> It does not matter that the csv file does not change often. It is
>>> possible that the event operator may fail and restart. The csv data needs
>>> to made available to it again.
>>>
>>> In that scenario the initial suggestion I made to pass the csv data in
>>> the constructor is not adequate by itself. You need to store it in the
>>> operator state which allows it to recover it when it restarts  on failure.
>>>
>>> As long as the above takes place you have resiliency and you can use any
>>> suitable method or source. I have not used Table source as much but
>>> connected streams and operator state has worked out for me in similar
>>> scenarios.
>>>
>>> Sameer
>>>
>>> Sent from my iPhone
>>>
>>> On Sep 27, 2019, at 4:38 PM, John Smith  wrote:
>>>
>>> It's a fairly small static file that may update once in a blue moon lol
>>> But I'm hopping to use existing functions. Why can't I just use CSV to
>>> table source?
>>>
>>> Why should I have to now either write my own CSV parser or look for 3rd
>>> party, then what put in a Java Map and lookup that map? I'm finding Flink
>>> to be a bit of death by 1000 paper cuts lol
>>>
>>> if i put the CSV in a table I can then use it to join across it with the
>>> event no?
>>>
>>> On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:
>>>
 Connected Streams is one option. But may be an overkill in your
 scenario if your CSV does not refresh. If your CSV is small enough (number
 of records wise), you could parse it and load it into an object
 (serializable) and pass it to the constructor of the operator where you
 will be streaming the data.

 If the CSV can be made available via a shared network folder (or S3 in
 case of AWS) you could also read it in the open function (if you use Rich
 versions of the operator).

 The real problem I guess is how frequently does the CSV update. If you
 want the updates to propagate in near real time (or on schedule) the option
 1  ( parse in driver and send it via constructor does not work). Also in
 the second option you need to be responsible for refreshing the file read
 from the shared folder.

 In that case use Connected Streams where the stream reading in the file
 (the other stream reads the events) periodically re-reads the file and
 sends it down the stream. The refresh interval is your tolerance of stale
 data in the CSV.

 On Fri, Sep 27, 2019 at 3:49 PM John Smith 
 wrote:

> I don't think I need state for this...
>
> I need to load a CSV. I'm guessing as a table and then filter my
> events parse the number, transform the event into geolocation data and 
> sink
> that downstream data source.
>
> So I'm guessing i need a CSV source and my Kafka source and somehow
> join those transform the event...
>
> On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
>
>> Hi,
>>
>> You should look broadcast state pattern in Flink docs.
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oy...@motaword.com — www.motaword.com
>>
>>
>> On Fri, Sep 27, 2019 at 2:42 PM John Smith 
>> wrote:
>>
>>> Using 

Re: Best way to link static data to event data?

2019-10-01 Thread John Smith
Or advantages/disadvantages?

On Mon., Sep. 30, 2019, 3:12 p.m. John Smith, 
wrote:

> Hi so here is what I have done...
>
> 1- I load my CSV using CSV table source.
> 2- 1 setup Kafka stream to read my incoming events.
> 3- Map my events to a POJO
> 4- Join the 2 tables
> 5- Push the joined result to Elastic search.
>
> This works absolutely fine. So whats the difference between this and the
> proposed solutions above?
>
>
> On Mon, 30 Sep 2019 at 13:35, John Smith  wrote:
>
>> Ok thanks. It's basically telephone area codes, they barely ever change.
>>
>> On Mon, 30 Sep 2019 at 06:03, Gaël Renoux 
>> wrote:
>>
>>> Hi John,
>>>
>>> I've had a similar requirement, and I've resorted to simply use a static
>>> cache (I'm coding in Scala, so that's a lazy value on a singleton object -
>>> in Java that would be a static value on some utility class, with a
>>> synchronized lazy-loading getter). The value is reloaded after some
>>> duration, which adds a small latency at regular intervals. Keep in mind
>>> that one instance of that value will be loaded on each task manager
>>> (provided that at least one task running on that task manager calls the
>>> getter).
>>>
>>> If you're OK with restarting the job when your data changes, it would be
>>> better to load it on start (no need to synchronize stuff). Just load it
>>> inside your job initialization code (it will be executed within the job
>>> manager) and pass that data as a parameter to your operator's constructor.
>>> The data format must be serializable.
>>>
>>> Gaël
>>>
>>>
>>> On Sat, Sep 28, 2019 at 2:18 AM Sameer Wadkar 
>>> wrote:
>>>
 The main consideration in these type of scenarios is not the type of
 source function you use. The key point is how does the event operator get
 the slow moving master data and cache it. And then recover it if it fails
 and restarts again.

 It does not matter that the csv file does not change often. It is
 possible that the event operator may fail and restart. The csv data needs
 to made available to it again.

 In that scenario the initial suggestion I made to pass the csv data in
 the constructor is not adequate by itself. You need to store it in the
 operator state which allows it to recover it when it restarts  on failure.

 As long as the above takes place you have resiliency and you can use
 any suitable method or source. I have not used Table source as much but
 connected streams and operator state has worked out for me in similar
 scenarios.

 Sameer

 Sent from my iPhone

 On Sep 27, 2019, at 4:38 PM, John Smith  wrote:

 It's a fairly small static file that may update once in a blue moon lol
 But I'm hopping to use existing functions. Why can't I just use CSV to
 table source?

 Why should I have to now either write my own CSV parser or look for 3rd
 party, then what put in a Java Map and lookup that map? I'm finding Flink
 to be a bit of death by 1000 paper cuts lol

 if i put the CSV in a table I can then use it to join across it with
 the event no?

 On Fri, 27 Sep 2019 at 16:25, Sameer W  wrote:

> Connected Streams is one option. But may be an overkill in your
> scenario if your CSV does not refresh. If your CSV is small enough (number
> of records wise), you could parse it and load it into an object
> (serializable) and pass it to the constructor of the operator where you
> will be streaming the data.
>
> If the CSV can be made available via a shared network folder (or S3 in
> case of AWS) you could also read it in the open function (if you use Rich
> versions of the operator).
>
> The real problem I guess is how frequently does the CSV update. If you
> want the updates to propagate in near real time (or on schedule) the 
> option
> 1  ( parse in driver and send it via constructor does not work). Also in
> the second option you need to be responsible for refreshing the file read
> from the shared folder.
>
> In that case use Connected Streams where the stream reading in the
> file (the other stream reads the events) periodically re-reads the file 
> and
> sends it down the stream. The refresh interval is your tolerance of stale
> data in the CSV.
>
> On Fri, Sep 27, 2019 at 3:49 PM John Smith 
> wrote:
>
>> I don't think I need state for this...
>>
>> I need to load a CSV. I'm guessing as a table and then filter my
>> events parse the number, transform the event into geolocation data and 
>> sink
>> that downstream data source.
>>
>> So I'm guessing i need a CSV source and my Kafka source and somehow
>> join those transform the event...
>>
>> On Fri, 27 Sep 2019 at 14:43, Oytun Tez  wrote:
>>
>>> Hi,
>>>
>>> You should look broadcast state pattern in Flink docs.
>>>
>>> ---
>>> Oytun Tez
>