[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622623#comment-16622623
 ] 

Jungtaek Lim edited comment on SPARK-10816 at 9/20/18 8:06 PM:
---------------------------------------------------------------

I'm aware of the needs for more advanced cases (like dynamic gap session 
window), but for simple case of session window we still have a chance to make 
it pretty simple. For DSL we may want to provide advanced (complicated) cases, 
but for SQL why not support basic case which can be expressed as SQL statement?

map/flatMapGroupsWithState is something reserved for experienced users: end 
users have to understand how typed API works, and the limitation of defining 
watermark in metadata of column (when row is serialized to object the 
information is lost. there's relevant issue in Spark JIRA but we just identify 
it as limitation and end users are dealing with it), how to craft state 
function correctly.

Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle 
multiple sessions in same key which is even not enough to handle fixed gap of 
session window. Event time and watermark would require us to deal with 
arbitrary changes of sessions, like multiple sessions which are not yet target 
of eviction, as well as multiple sessions being merged into one due to late 
event. Current mechanism of map/flapMapGroupsWithState don't handle this 
(unless it is changed to support multiple values on group key, then state 
function should be changed too), and at least require end users to deal with it 
at their own hands.


was (Author: kabhwan):
I'm aware of the needs for more advanced cases (like dynamic gap session 
window), but for simple case of session window we still have a chance to make 
it pretty simple. For DSL we may want to provide advanced (complicated) cases, 
but for SQL why not support basic case which can be expressed as SQL statement?

map/flatMapGroupsWithState is something reserved for experienced users: end 
users have to understand how typed API works, and the limitation of defining 
watermark in metadata of column (when row is serialized to object the 
information is lost. there's relevant issue in Spark JIRA but we just identify 
it as limitation and end users are dealing with it), how to craft state 
function correctly.

Moreover, as I mentioned in doc, map/flapMapGroupsWithState don't handle 
multiple sessions in same key which is even not enough to handle fixed gap of 
session window. Event time and watermark would require us to deal with 
arbitrary changes of sessions, like multiple sessions which are not yet target 
of eviction, as well as multiple sessions being merged into one due to late 
event. Current mechanism of map/flapMapGroupsWithState don't handle this, and 
at least require end users to deal with it at their own hands.

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to