[GitHub] spark pull request #18775: [SPARK-8577][SparkR] Eliminate needless synchroni...
GitHub user SereneAnt opened a pull request: https://github.com/apache/spark/pull/18775 [SPARK-8577][SparkR] Eliminate needless synchronization in java-R serialization ## What changes were proposed in this pull request? Remove surplus synchronized blocks. ## How was this patch tested? Unit tests run OK. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SereneAnt/spark eliminate_unnecessary_synchronization_in_java-R_serialization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18775 commit eafad7d0cdf73c9fab9a03f2898039de7b127bc7 Author: iurii.ant <serene...@gmail.com> Date: 2017-07-30T05:04:06Z [SPARK-8577][SparkR] Eliminate needless synchronization in java-R serialization --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...
Github user SereneAnt commented on a diff in the pull request: https://github.com/apache/spark/pull/18693#discussion_r128637486 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -199,10 +199,10 @@ object PageRank extends Logging { require(sources.max <= Int.MaxValue.toLong, s"This implementation currently only works for source vertex ids at most ${Int.MaxValue}") val zero = Vectors.sparse(sources.size, List()).asBreeze -val sourcesInitMap = sources.zipWithIndex.map { case (vid, i) => +val sourcesInitMap: Map[VertexId, BV[Double]] = sources.zipWithIndex.map { case (vid, i) => val v = Vectors.sparse(sources.size, Array(i), Array(1.0)).asBreeze (vid, v) -}.toMap +}(collection.breakOut) --- End diff -- Optimization nerds have already used it in the spark code: https://github.com/apache/spark/blob/c4008480b781379ac0451b9220300d83c054c60d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...
Github user SereneAnt commented on a diff in the pull request: https://github.com/apache/spark/pull/18693#discussion_r128635631 --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala --- @@ -199,10 +199,10 @@ object PageRank extends Logging { require(sources.max <= Int.MaxValue.toLong, s"This implementation currently only works for source vertex ids at most ${Int.MaxValue}") val zero = Vectors.sparse(sources.size, List()).asBreeze -val sourcesInitMap = sources.zipWithIndex.map { case (vid, i) => +val sourcesInitMap: Map[VertexId, BV[Double]] = sources.zipWithIndex.map { case (vid, i) => val v = Vectors.sparse(sources.size, Array(i), Array(1.0)).asBreeze (vid, v) -}.toMap +}(collection.breakOut) --- End diff -- The principles are the same, `sources.zipWithIndex.map {...}' allocates a collection of tuples, `.toMap` then uterates and converts them into the map (hehe). breakOut is the implementation of CanBuildFrom, the implicit parameter passed to `Traversable.map` method. Once used, it allows populating newborn map directly, without intermediate collection of tuples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...
GitHub user SereneAnt opened a pull request: https://github.com/apache/spark/pull/18693 [SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap ## What changes were proposed in this pull request? `Traversable.toMap` changed to 'collections.breakOut', that eliminates intermediate tuple collection creation, see [Stack Overflow article](https://stackoverflow.com/questions/1715681/scala-2-8-breakout). ## How was this patch tested? Unit tests run. No performance tests performed yet. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/SereneAnt/spark performance_toMap-breakOut Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18693.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18693 commit d7b73b544116314439a536ac944d66a8d7ee1113 Author: iurii.ant <serene...@gmail.com> Date: 2017-07-20T21:03:43Z [SPARK-21491][GraphX] Enhance GraphX performance: eliminate intermediate collections creation with breakOut --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org