[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654627#comment-16654627
 ] 

Jungtaek Lim edited comment on SPARK-10816 at 10/18/18 4:36 AM:
----------------------------------------------------------------

Just ran another performance test to check my new trial of improving state.

Here I try to overwrite values for given key instead of removing all values and 
append new values for given key.
https://github.com/HeartSaVioR/spark/commit/6d466b9f424ae6a2b5a927e650f60ef35cfe30ca

The result was no luck (small performance hit compared to current), hence I 
would not put the numbers for that here. But I've run the test from AWS 
c5d.xlarge with dedicated option, hence more isolated and stable env. compared 
to before, which shows higher input rate.

Test Env.: c5d.xlarge, dedicated

A .plenty of sessions

1. HWX (Append Mode) 

1.a. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 113355 | 20234.7375937 | 19278.0612245 |
| 22 | 118905 | 20218.5002551 | 17958.7675578 |
| 23 | 120000 | 18121.4134703 | 15622.9657597 |
| 24 | 160000 | 20827.9093986 | 14406.6270484 |
| 25 | 220000 | 19807.3287116 | 12593.0165999 |

2. Baidu (Append Mode)

2.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 18 | 1005000 | 15068.3699172 | 5993.05878565 |
| 19 | 2505000 | 14937.8335669 | 4823.00254531 |

(cancelled since following batch takes too long... it even can't reach 10000)

3. HWX (Update Mode)

3.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 25 | 165000 | 15136.2260343 | 15351.6933383 |
| 26 | 165000 | 15350.2651409 | 28128.196386 |
| 27 | 90000 | 15342.6525742 | 16669.7536581 |
| 28 | 75000 | 13888.8888889 | 13557.483731 |
| 29 | 90000 | 16266.0401229 | 15131.1365165 |
| 30 | 90000 | 15128.5930408 | 13829.1333743 |

3.b. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 23 | 318210 | 19853.3815822 | 20039.6750425 |
| 24 | 320000 | 20151.1335013 | 23456.9711186 |
| 25 | 280000 | 20523.3453053 | 15197.5683891 |

(cancelled since following batch takes too long...)

B. plenty of rows in session

1. HWX (Append Mode)

1.a. input rate 30000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 295730 | 30210.4402901 | 25682.1537125 |
| 22 | 360000 | 31260.8544634 | 25906.7357513 |
| 23 | 420000 | 30222.3501475 | 28753.337441 |
| 24 | 420000 | 28751.3691128 | 29702.970297 |
| 25 | 420000 | 29700.8698112 | 28561.7137028 |

1.b. input rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 19 | 441716 | 36073.1727236 | 29971.2308319 |
| 20 | 490000 | 33245.1319628 | 28194.9479257 |
| 21 | 630000 | 36250.647333 | 30189.7642323 |
| 22 | 735000 | 35219.703867 | 28420.0757869 |
| 23 | 910000 | 35185.3999923 | 30372.8179967 |

2. Baidu (Append Mode)

2.a rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 1 | 4335 | 752.081887578 | 111.233706251 |

(cancelled due to long running batch... and it even can't catch up input rate 
1000, as we already know)



was (Author: kabhwan):
Just ran another performance test to check my new trial of improving state.

Here I try to overwrite values for given key instead of removing all values and 
append new values for given key.
https://github.com/HeartSaVioR/spark/commit/6d466b9f424ae6a2b5a927e650f60ef35cfe30ca

The result was no luck (small performance hit compared to current), hence I 
would not put the numbers for that here. But I've run the test from AWS 
c5d.xlarge with dedicated option, hence more isolated and stable env. compared 
to before, which shows higher input rate.

Test Env.: c5d.xlarge, dedicated

A .plenty of sessions

1. HWX (Append Mode) 

1.a. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 113355 | 20234.7375937 | 19278.0612245 |
| 22 | 118905 | 20218.5002551 | 17958.7675578 |
| 23 | 120000 | 18121.4134703 | 15622.9657597 |
| 24 | 160000 | 20827.9093986 | 14406.6270484 |
| 25 | 220000 | 19807.3287116 | 12593.0165999 |

2. Baidu (Append Mode)

2.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 18 | 1005000 | 15068.3699172 | 5993.05878565 |
| 19 | 2505000 | 14937.8335669 | 4823.00254531 |

(cancelled since following batch takes too long... it even can't reach 10000)

3. HWX (Update Mode)

3.a. input rate 15000

||batch id||input rows||input rows per second||processed rows per second||
| 25 | 165000 | 15136.2260343 | 15351.6933383 |
| 26 | 165000 | 15350.2651409 | 28128.196386 |
| 27 | 90000 | 15342.6525742 | 16669.7536581 |
| 28 | 75000 | 13888.8888889 | 13557.483731 |
| 29 | 90000 | 16266.0401229 | 15131.1365165 |
| 30 | 90000 | 15128.5930408 | 13829.1333743 |

4. HWX (Update Mode) 

4.a. input rate 20000

||batch id||input rows||input rows per second||processed rows per second||
| 23 | 318210 | 19853.3815822 | 20039.6750425 |
| 24 | 320000 | 20151.1335013 | 23456.9711186 |
| 25 | 280000 | 20523.3453053 | 15197.5683891 |

(cancelled since following batch takes too long...)

B. plenty of rows in session

1. HWX (Append Mode)

1.a. input rate 30000

||batch id||input rows||input rows per second||processed rows per second||
| 21 | 295730 | 30210.4402901 | 25682.1537125 |
| 22 | 360000 | 31260.8544634 | 25906.7357513 |
| 23 | 420000 | 30222.3501475 | 28753.337441 |
| 24 | 420000 | 28751.3691128 | 29702.970297 |
| 25 | 420000 | 29700.8698112 | 28561.7137028 |

1.b. input rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 19 | 441716 | 36073.1727236 | 29971.2308319 |
| 20 | 490000 | 33245.1319628 | 28194.9479257 |
| 21 | 630000 | 36250.647333 | 30189.7642323 |
| 22 | 735000 | 35219.703867 | 28420.0757869 |
| 23 | 910000 | 35185.3999923 | 30372.8179967 |

2. Baidu (Append Mode)

2.a rate 35000

||batch id||input rows||input rows per second||processed rows per second||
| 1 | 4335 | 752.081887578 | 111.233706251 |

(cancelled due to long running batch... and it even can't catch up input rate 
1000, as we already know)


> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to