[ 
https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907304#comment-15907304
 ] 

Vladislav Pernin commented on FLINK-3089:
-----------------------------------------

Another use case is idempotency.

It could be handled by a an in memory cache snapshoted at checkpoint but it has 
... to be not too large because in memory.
Ex : 
https://github.com/jgrier/FilteringExample/blob/master/src/main/java/com/dataartisans/filters/DedupeFilterFunction.java

It could also be implemented by a MapState backed by RocksDB. It won't be in 
memory, so it can grow very large.
But it won't be possible to expire the state if some entries are never queried 
because the remains without duplicates.

> State API Should Support Data Expiration
> ----------------------------------------
>
>                 Key: FLINK-3089
>                 URL: https://issues.apache.org/jira/browse/FLINK-3089
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataStream API, State Backends, Checkpointing
>            Reporter: Niels Basjes
>
> In some usecases (webanalytics) there is a need to have a state per visitor 
> on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so 
> a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, 
> like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind 
> of information. But that introduces the buffering effect of the window (which 
> in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific OperatorState 
> value which I can update afterwards.
> This makes it possible to create a map function that assigns the right value 
> and that discards the state automatically.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to