[GitHub] spark issue #17112: Measurement for SPARK-16929.

jinxing64 Tue, 28 Feb 2017 20:02:59 -0800

Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/17112
  
    The unit test "Measurement for SPARK-16929." added is the measurement.
    In TaskSetManagerSuite.scala line 1049, if `newAlgorithm=true`, 
`successfulTaskIdsSet `will be used to get the median duration. If 
`newAlgorithm=false`, old algorithm(`Arrays.sort`) will be used.
    
    I calculate the time used for getting median duration in 
TaskSetManager.scala line 957.
    If `tasksNum=1000`(TaskSetManagerSuite.scala line 1043), measurement is as 
below:
    
    | newAlgorithm | time cost |
    | ------| ------ |
    | false | 5ms, 3ms, 4ms, 3ms, 3ms |
    | true | 2ms, 4ms, 2ms, 2ms, 3ms |
    
    | newAlgorithm | time cost |
    | ------| ------ |
    | false | 107ms, 109ms, 103ms, 100ms, 107ms |
    | true | 17ms, 14ms, 14ms, 13ms, 14ms |
    
    if `tasksNum=150000`:
    
    | newAlgorithm | time cost |
    | ------| ------ |
    | false | 133ms, 146ms, 127ms, 163ms, 114ms |
    | true | 14ms, 13ms, 15ms, 16ms, 14ms |
    
    As we can see, new algorithm(`TreeSet`) has better performance than old 
algorithm(`Arrays.sort`). When `tasksNum`=100000, `Arrays.sort` costs over 
100ms every time, while in new algorithm all below 20ms.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17112: Measurement for SPARK-16929.

Reply via email to