[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698433#comment-16698433
 ] 

huangtengfei commented on SPARK-10816:
--------------------------------------

Ran the benchmark [~kabhwan] mentioned above last week, and found the key 
performance issue in the original [Baidu's 
patch|https://github.com/apache/spark/pull/22583]. With a [fix 
patch|https://github.com/apache/spark/pull/22583/commits/672bccb64e75b009179e00fe6ede9bf34b5b4dbb],
 and ran the benchmark, got results as follows (cc [~XuanYuan]):

Ran with Local[3],35G driver-memory. (CPU with 2.3GHz)

A. Plenty Of Rows In Session / Append Mode (rate: 50,000)
||batchId||input rows||input rows per second||processed rows per second||
|2|83300|16410.55949566588|14933.667981355324|
|3|178500|31949.167710757116|19739.02465995798|
|4|414200|45752.78913067491|22497.420020639835|
|5|950000|51571.57591878834|25623.044557125904|
|6|1850000|49885.39840906027|26061.84405156019|
|7|3550000|50003.5213747447|25037.55633450175|

B. Plenty Of Keys / Append Mode (rate: 5,000,000)
||batchId||input rows||input rows per second||processed rows per second||
|2|8333325|1601638.4778012685|1626331.967213115|
|3|17857125|3477531.64556962|2343762.304764405|
|4|31428550|4118536.2337832525|2693337.046876339|
|5|60000000|5137426.149499101|3087372.6458783573|
|6|95000000|4885574.697865775|3265278.064205678|
|7|145000000|4981790.69607641|2888561.297262839|
|8|255000000|5078871.89292543|2521157.9530174797|

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to