[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672478#comment-16672478
 ] 

Jungtaek Lim commented on SPARK-10816:
--------------------------------------

UPDATE: I just discovered the performance critical issue on my patch (I guess 
same issue is occurring on Baidu's patch) and just fixed yesterday.

1. Plenty Of Sessions In Key / Append Mode (rate: 500,000)

* HWX, Linked List version: max around 250,000
* HWX, Latest: max around 235,000
* flatMapGroupsWithState (with my state func. implementation): max around 
250,000

They're showing CPU being maxed out. (Ran with local[3], and 3 cores of CPU are 
all 100% user.)
When I increase rate to 1,000,000 I observed max rate would go up to around 
350,000 for linked list version.

2. Plenty Of Keys / Append Mode (rate: 5,000,000)

* HWX, Linked List version: max around 3,700,000
* HWX, Latest: max around 4,000,000
* flatMapGroupsWithState (with my state func. implementation): max around 
2,100,000

3. Plenty Of Rows In Session / Append Mode (rate: 50000)

* HWX, Linked List version: max around 27,000
* HWX, Latest: max around 36,000
* flatMapGroupsWithState (with my state func. implementation): max around 26,000

Please note that Linked list version doesn't materialize all sessions (even in 
given key) into memory, so it shows close or even better performance than 
manual implementation of flatMapGroupsWithState, as well as get rid of concern 
on loading sessions in memory. Linked list version sometimes faster than latest 
version (load all sessions in group memory) and sometimes slower.

I'll update my patch against linked list version. I guess there's no 
performance issue now, now waiting for committers' review.

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to