Hi

I am a CS PhD student at UMN Twin Cities. I am trying to use Flink for 
implementing some very specific use-cases: (They may not seem relevant but I 
need to implement them or I at least need to know if it is possible to 
implement them in Flink)

Assumptions:
1. Data stream is of the form (key, value). We achieve this by the .key 
operation provided by Flink API.
2. By emitting a key, I mean sending/outputting its aggregated value to any 
data sink. 

1. For each Tumbling window in the Event Time space, for each key, I would like 
to aggregate its value until it crosses a particular threshold (same threshold 
for all the keys). As soon as the key’s aggregated value crosses this 
threshold, I would like to emit this key. At the end of every tumbling window, 
all the (key, value) aggregated pairs  would be emitted irrespective of whether 
they have crossed the threshold or not.

2. For each Tumbling window in the event time space, I would like to maintain a 
LRU cache which stores the keys along with their aggregated values and their 
latest arrival time. The least recently used (LRU) key would be the key whose 
latest arrival time is earlier than the latest arrival times of all the other 
keys present in the LRU cache. The LRU cache is of a limited size. So, it is 
possible that the number of unique keys in a particular window is greater than 
the size of LRU cache. Whenever any (key, value) pair arrives, if the key 
already exists, its aggregated value is updated with the value of the newly 
arrived value and its latest arrival time is updated with the current event 
time. If the key does not exist and there is some free slot in the LRU cache, 
it is added into the LRU. As soon as the LRU cache gets occupied fully and a 
new key comes in which does not exist in the LRU cache, we would like to emit 
the least recently used key to accommodate the newly arrived key. As in the 
case of 1, at the end of every tumbling window, all the (key, value) aggregated 
pairs in the LRU cache would be emitted.  

Would like to know how can we implement these algorithms using Flink. Any help 
would be greatly appreciated.

Dhruv Kumar
PhD Candidate
Department of Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me

Reply via email to