GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/4830
[SPARK-6079 ] Use index to speed up StatusTracker.getJobIdsForGroup()
`StatusTracker.getJobIdsForGroup()` is implemented via a linear scan over a
HashMap rather than using an index, which might be an expensive operation if
there are many (e.g. thousands) of retained jobs.
This patch adds a new map to `JobProgressListener` in order to speed up
these lookups.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark
statustracker-job-group-indexing
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4830.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4830
commit 97275a7a472ba782c268f391876529fec8fbf2ab
Author: Josh Rosen
Date: 2015-02-28T07:29:23Z
Add jobGroup to jobId index to JobProgressListener
commit 2c49614cc4f92dc1a47044be362db51cfe4da77b
Author: Josh Rosen
Date: 2015-02-28T07:31:27Z
getOrElse
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org