[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

HeartSaVioR Thu, 02 Aug 2018 02:41:06 -0700

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21733
  
    @tdas 
    
    I found the spare time to run performance tests though I've run only one 
app for now... I couldn't run the tests concurrently. Please let me know if you 
are not confident with the results from one app: I'll find more time to go with 
all test cases. Hope this number could give confident to accept the patch.
    
    > Machine info.
    
    MBP 15-inch Mid 2015
    
    * i7 2.5Ghz (4 core)
    * 16GB 1600 Mhz DDR3
    * SSD 512G
    
    > Test information
    
    * base commit : c9914cf (latest master branch)
    * patch internally rebased with base commit before testing
    * spark-submit options: master local[3] --driver-memory 6g
      * I don't run perf. test with all cores and memory: I left some spare 
resource for OS and background apps.
    
    > Performance test code
    
    
https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/blob/master/src/main/scala/com/hortonworks/spark/benchmark/BenchmarkMovingAggregationsListener.scala
    
    Please note that there're 4 more apps (big key size, big value size, many 
key columns, many value columns) in same repository.
    
    > Test result
    
    Both of version didn't catch up rate per seconds 200000, but since 
processed rows per second were around 188000 I felt I don't need to adjust rate 
per seconds more tightly (like 185000, 190000, etc...).
    
    The numbers for input rows per seconds and processed rows per second are 
calculated by taking average of 3 batches (38, 39, 40 respectively). The 
numbers regarding state are picked when total state rows went to 60000.
    
    version | input rows per second | processed rows per second | total state 
rows | used bytes of current state version
    ---- | ---- | ---- | ---- | ----
    | latest master (c9914cf) | 200492.065 | 188880.316 | 60000 | 17,755,895 |
    | patch (on top of c9914cf) | 199242.598 | 188160.833  | 60000 | 14,687,543 
|
    
    So while two processed rows per seconds didn't show outstanding difference 
(under 1%), the patch reduced memory usage of state (for latest version) by 
17.29 %. One thing to note is, in performance test, state is saved to the local 
SSD. It may give (small? trivial?) performance benefit on the patch when we set 
remote checkpoint directory.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

Reply via email to