Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17112 The unit test "Measurement for SPARK-16929." added is the measurement. In TaskSetManagerSuite.scala line 1049, if `newAlgorithm=true`, `successfulTaskIdsSet `will be used to get the median duration. If `newAlgorithm=false`, old algorithm(`Arrays.sort`) will be used. I calculate the time used for getting median duration in TaskSetManager.scala line 957. If `tasksNum=1000`(TaskSetManagerSuite.scala line 1043), measurement is as below: | newAlgorithm | time cost | | ------| ------ | | false | 5ms, 3ms, 4ms, 3ms, 3ms | | true | 2ms, 4ms, 2ms, 2ms, 3ms | | newAlgorithm | time cost | | ------| ------ | | false | 107ms, 109ms, 103ms, 100ms, 107ms | | true | 17ms, 14ms, 14ms, 13ms, 14ms | if `tasksNum=150000`: | newAlgorithm | time cost | | ------| ------ | | false | 133ms, 146ms, 127ms, 163ms, 114ms | | true | 14ms, 13ms, 15ms, 16ms, 14ms | As we can see, new algorithm(`TreeSet`) has better performance than old algorithm(`Arrays.sort`). When `tasksNum`=100000, `Arrays.sort` costs over 100ms every time, while in new algorithm all below 20ms.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org