Yuming Wang created SPARK-19146: ----------------------------------- Summary: Drop more elements when stageData.taskData.size > retainedTasks to reduce the number of times on call drop Key: SPARK-19146 URL: https://issues.apache.org/jira/browse/SPARK-19146 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.1.0 Reporter: Yuming Wang
The performance of the [{{drop}}|https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala#L412] function is bad. * Modify the code: {code:java} if (stageData.taskData.size > retainedTasks) { val start = System.currentTimeMillis() stageData.taskData = stageData.taskData.drop(stageData.taskData.size - retainedTasks) logInfo(s"Time consuming: ${System.currentTimeMillis() - start}") } {code} * Time consuming {noformat} 17/01/10 14:04:05 INFO JobProgressListener: Time consuming: 156 17/01/10 14:04:05 INFO JobProgressListener: Time consuming: 145 17/01/10 14:04:05 INFO JobProgressListener: Time consuming: 148 17/01/10 14:04:05 INFO JobProgressListener: Time consuming: 159 {noformat} My opinion is drop more elements when {{stageData.taskData.size > retainedTasks}} to reduce the number of times on call {{drop}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org