[jira] [Commented] (SPARK-10816) EventTime based sessionization

Jungtaek Lim (JIRA) Tue, 16 Oct 2018 02:46:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651404#comment-16651404
 ]


Jungtaek Lim commented on SPARK-10816:
--------------------------------------

Update: I've crafted another performance test for testing same query with data 
pattern.

[https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/tree/benchmarking-SPARK-10816]

 

I've separated packages for both data pattern just for simplicity. Classnames 
are same.

Data pattern 1: plenty of rows in same session

[https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/tree/benchmarking-SPARK-10816/src/main/scala/com/hortonworks/spark/benchmark/streaming/sessionwindow/plenty_of_rows_in_session]

Data pattern 2: plenty of sessions

[https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/tree/benchmarking-SPARK-10816/src/main/scala/com/hortonworks/spark/benchmark/streaming/sessionwindow/plenty_of_sessions]

 

While running benchmark with data pattern 2, I've found some performance hits 
on my patch so made some fixes as well. Most of the fixes were reducing the 
number of codegen: but there's also a major fix: made pre-merging sessions in 
local partition being optional. It seriously harms the performance with data 
pattern 2.

The patch still lacks with state sub-optimal. I guess it is now the major 
bottleneck on my patch, so wrapping my head to find good alternatives. Baidu's 
list state would be the one of, since I realized \[3] might put more deltas as 
well as requires more operations.

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10816) EventTime based sessionization

Reply via email to