[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698433#comment-16698433 ]
huangtengfei commented on SPARK-10816: -------------------------------------- Ran the benchmark [~kabhwan] mentioned above last week, and found the key performance issue in the original [Baidu's patch|https://github.com/apache/spark/pull/22583]. With a [fix patch|https://github.com/apache/spark/pull/22583/commits/672bccb64e75b009179e00fe6ede9bf34b5b4dbb], and ran the benchmark, got results as follows (cc [~XuanYuan]): Ran with Local[3],35G driver-memory. (CPU with 2.3GHz) A. Plenty Of Rows In Session / Append Mode (rate: 50,000) ||batchId||input rows||input rows per second||processed rows per second|| |2|83300|16410.55949566588|14933.667981355324| |3|178500|31949.167710757116|19739.02465995798| |4|414200|45752.78913067491|22497.420020639835| |5|950000|51571.57591878834|25623.044557125904| |6|1850000|49885.39840906027|26061.84405156019| |7|3550000|50003.5213747447|25037.55633450175| B. Plenty Of Keys / Append Mode (rate: 5,000,000) ||batchId||input rows||input rows per second||processed rows per second|| |2|8333325|1601638.4778012685|1626331.967213115| |3|17857125|3477531.64556962|2343762.304764405| |4|31428550|4118536.2337832525|2693337.046876339| |5|60000000|5137426.149499101|3087372.6458783573| |6|95000000|4885574.697865775|3265278.064205678| |7|145000000|4981790.69607641|2888561.297262839| |8|255000000|5078871.89292543|2521157.9530174797| > EventTime based sessionization > ------------------------------ > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Reporter: Reynold Xin > Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org