[ 
https://issues.apache.org/jira/browse/FLINK-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318013#comment-16318013
 ] 

Bowen Li edited comment on FLINK-3089 at 1/9/18 8:27 AM:
---------------------------------------------------------

[~sihuazhou] If we don't enforce deletion, TtlDB won't promise to expire the 
data right after TTL, which may cause uncertainty somewhere. Frankly, I think 
it might only cause uncertainty in unit tests, and will not impact production. 
I want to have this limitation fully discussed ahead.

Enforcing *strict TTL*, as you said, is costly in both heap and RocksDB. So 
take a step back, I think Flink probably should adopt *a relaxed TTL policy 
like TtlDB's* - ["...when key-values inserted are meant to be removed from the 
db in a non-strict 'ttl' amount of time therefore, this guarantees that 
key-values inserted will remain in the db for at least ttl amount of time and 
the db will make efforts to remove the key-values as soon as possible after ttl 
seconds of their 
insertion."|https://github.com/facebook/rocksdb/wiki/Time-to-Live]   This way, 
it makes everything much easier and performant. What do you think?

And how do you distinguish processing time with event time with TtlDB? Do you 
proximate event time to processing time?


was (Author: phoenixjiangnan):
[~sihuazhou] If we don't enforce deletion, TtlDB won't promise to expire the 
data right after TTL, which may cause uncertainty somewhere. Frankly, I think 
it might only cause uncertainty in unit tests, and will not impact production. 
I want to have this limitation fully discussed ahead.

Enforcing *strict TTL*, as you said, is costly in both heap and RocksDB. So 
take a step back, I think Flink probably should adopt *a relaxed TTL policy 
like TtlDB's* - ["...when key-values inserted are meant to be removed from the 
db in a non-strict 'ttl' amount of time therefore, this guarantees that 
key-values inserted will remain in the db for at least ttl amount of time and 
the db will make efforts to remove the key-values as soon as possible after ttl 
seconds of their 
insertion."|https://github.com/facebook/rocksdb/wiki/Time-to-Live]   This way, 
it makes everything much easier. What do you think?

And how do you distinguish processing time with event time with TtlDB? Do you 
proximate event time to processing time?

> State API Should Support Data Expiration (State TTL)
> ----------------------------------------------------
>
>                 Key: FLINK-3089
>                 URL: https://issues.apache.org/jira/browse/FLINK-3089
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataStream API, State Backends, Checkpointing
>            Reporter: Niels Basjes
>            Assignee: Bowen Li
>
> In some usecases (webanalytics) there is a need to have a state per visitor 
> on a website (i.e. keyBy(sessionid) ).
> At some point the visitor simply leaves and no longer creates new events (so 
> a special 'end of session' event will not occur).
> The only way to determine that a visitor has left is by choosing a timeout, 
> like "After 30 minutes no events we consider the visitor 'gone'".
> Only after this (chosen) timeout has expired should we discard this state.
> In the Trigger part of Windows we can set a timer and close/discard this kind 
> of information. But that introduces the buffering effect of the window (which 
> in some scenarios is unwanted).
> What I would like is to be able to set a timeout on a specific state which I 
> can update afterwards.
> This makes it possible to create a map function that assigns the right value 
> and that discards the state automatically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to