[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672478#comment-16672478 ]
Jungtaek Lim commented on SPARK-10816: -------------------------------------- UPDATE: I just discovered the performance critical issue on my patch (I guess same issue is occurring on Baidu's patch) and just fixed yesterday. 1. Plenty Of Sessions In Key / Append Mode (rate: 500,000) * HWX, Linked List version: max around 250,000 * HWX, Latest: max around 235,000 * flatMapGroupsWithState (with my state func. implementation): max around 250,000 They're showing CPU being maxed out. (Ran with local[3], and 3 cores of CPU are all 100% user.) When I increase rate to 1,000,000 I observed max rate would go up to around 350,000 for linked list version. 2. Plenty Of Keys / Append Mode (rate: 5,000,000) * HWX, Linked List version: max around 3,700,000 * HWX, Latest: max around 4,000,000 * flatMapGroupsWithState (with my state func. implementation): max around 2,100,000 3. Plenty Of Rows In Session / Append Mode (rate: 50000) * HWX, Linked List version: max around 27,000 * HWX, Latest: max around 36,000 * flatMapGroupsWithState (with my state func. implementation): max around 26,000 Please note that Linked list version doesn't materialize all sessions (even in given key) into memory, so it shows close or even better performance than manual implementation of flatMapGroupsWithState, as well as get rid of concern on loading sessions in memory. Linked list version sometimes faster than latest version (load all sessions in group memory) and sometimes slower. I'll update my patch against linked list version. I guess there's no performance issue now, now waiting for committers' review. > EventTime based sessionization > ------------------------------ > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Reporter: Reynold Xin > Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org