Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17179#discussion_r105315079 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala --- @@ -61,25 +65,49 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState * - After that, if `update(newState)` is called, then `exists()` will again return `true`, * `get()` and `getOption()`will return the updated value. * + * Important points to note about using `KeyedStateTimeout`. + * - The timeout type is a global param across all the keys (set as `timeout` param in + * `[map|flatMap]GroupsWithState`, but the exact timeout duration is configurable per key + * (by calling `setTimeout...()` in `KeyedState`). + * - When the timeout occurs for a key, the function is called with no values, and + * `KeyedState.isTimingOut()` set to true. + * - The timeout is reset for key every time the function is called on the key, that is, + * when the key has new data, or the key has timed out. So the user has to set the timeout + * duration every time the function is called, otherwise there will not be any timeout set. + * - Guarantees provided on processing-time-based timeout of key, when timeout duration is D ms: + * - Timeout will never be called before real clock time has advanced by D ms + * - Timeout will be called eventually when there is a trigger with any data in it --- End diff -- Note that there is a fundamental difference between Spark Streaming and Structured Streaming. Unlike the former, Structured Streaming queries will not execute any batch if there is no new data. So as long as there is some data and batch kicks off, timeouts will be processed. I think "a trigger with any data" is confusing. A trigger by definition has data. I will rewrite. Thank for pointing it out though.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org